Do first encounters make or break new users?
Using text features in the first comment to predict new user return on Reddit
Keywords:new user attrition, churn, Reddit, text analysis, sentiment analysis, online comments, social media, social feedback
Many new users quit a site after only one interaction. Existing studies of user return consider user characteristics and simple feedback like upvotes, while leaving potentially useful text data unstudied. Here, we analyze 700,000 first post/sole comment pairs on Reddit, with the goal of determining whether comments are related to return probabilities. Using two complementary text analysis techniques—text regression (CCS) and Linguistic Inquiry and Word Count (LIWC)—we demonstrate that information from the first comment a new user receives improves predictions of new user return. Our work serves as an example of useful predictive features being extracted from very short text comments, and also illustrates the importance of social feedback on the experiences of new users.
De Choudhury, M., & De, S. (2014). Mental health discourse on reddit: Self-disclosure, social support, and anonymity. In Eighth international AAAI conference on weblogs and social media. https://www.sushovan.de/research/reddit-icwsm.pdf
Coussement, K. and Bock, K. W. D. (2013). Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning. Journal of Business Research, 66(9):1629–1636. https://doi.org/10.1016/j.jbusres.2012.12.008
Coussement, K. and den Poel, D. V. (2008). Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques. Expert Systems With Applications, 34(1):313– 327. https://doi.org/10.1016/j.eswa.2006.09.038
DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3). https://www.jstor.org/stable/2531595
Dror, G., Pelleg, D., Rokhlenko, O., and Szpektor, I. (2012). Churn prediction in new users of Yahoo! answers. Proceedings of the 21st International Conference on World Wide Web, pages 829–834. https://doi.org/10.1145/2187980.2188207
He, B., Shi, Y., Wan, Q., and Zhao, X. (2014). Prediction of customer attrition of commercial banks based on SVM model. Procedia Computer Science, 31:423–430. https://doi.org/10.1016/j.procs.2014.05.286
Hung, S.-Y., Yen, D. C., and Wang, H.-Y. (2006). Applying data mining to telecom churn management. Expert Systems With Applications, 31(3):515– 524. https://doi.org/10.1016/j.eswa.2005.09.080
Jamal, Z. and Bucklin, R. E. (2006). Improving the diagnosis and prediction of customer churn: A heterogeneous hazard modeling approach. Journal of Interactive Marketing, 20(3):16–29. https://doi.org/10.1002/dir.20064
Jia, J., Miratrix, L., Yu, B., Gawalt, B., El Ghaoui, L., Barnesmoore, L., and Clavier, S. (2014). Concise comparative summaries (CCS) of large text corpora with a human experiment. Annals Of Applied Statistics, 8(1):499–529. https://projecteuclid.org/euclid.aoas/1396966296
Miratrix, L. (2017). textreg: n-Gram Text Regression, aka Concise Comparative Summarization. R package version 0.1.4. https://cran.r-project.org/web/packages/textreg/index.html
Miratrix, L. W. and Ackerman, R. (2016). Conducting sparse feature selection on arbitrarily long phrases in text corpora with a focus on interpretability. Statistical Analysis and Data Mining: The ASA Data Science Journal. https://doi.org/10.1002/sam.11323
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., and Müller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12:77. https://doi.org/10.1186/1471-2105-12-77
Sarkar, C. (2013). The effects of participation and feedback received on the length of time members in online communities remain active. PhD thesis, Michigan State University.
Tausczik, Y. R. and Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1):24–54. https://doi.org/10.1177/0261927X09351676
Wang, T., Wang, K., Erlandsson, F., Wu, S., and Faris, R. (2013). The influence of feedback with different opinions on continued user participation in online newsgroups. ASONAM ’13, pages 388–395. ACM and IEEE. https://doi.org/10.1145/2492517.2492555
Yang, J., Wei, X., Ackerman, M. S., and Adamic, L. A. (2010). Activity lifespan: An analysis of user survival patterns in online. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media. Association for the Advancement of Artificial Intelligence. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/view/1466
LicenseAuthors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).