On the Feasibility of Predicting Questions being Forgotten in Stack Overflow
Thi Huyen Nguyen, Tu Nguyen, Tuan-Anh Hoang, Claudia Nieder\'ee

TL;DR
This paper investigates the feasibility of predicting questions on Stack Overflow that will become irrelevant over time, analyzing data from over a decade and finding meta-information more useful than text features for prediction.
Contribution
It introduces a study on predicting question forgetting in Stack Overflow, highlighting the importance of meta-information over text features for this task.
Findings
Meta-information is more predictive than text features.
Certain question categories are more predictable.
Analysis covers over 18 million questions from 2008 to 2019.
Abstract
For their attractiveness, comprehensiveness and dynamic coverage of relevant topics, community-based question answering sites such as Stack Overflow heavily rely on the engagement of their communities: Questions on new technologies, technology features as well as technology versions come up and have to be answered as technology evolves (and as community members gather experience with it). At the same time, other questions cease in importance over time, finally becoming irrelevant to users. Beyond filtering low-quality questions, "forgetting" questions, which have become redundant, is an important step for keeping the Stack Overflow content concise and useful. In this work, we study this managed forgetting task for Stack Overflow. Our work is based on data from more than a decade (2008 - 2019) - covering 18.1M questions, that are made publicly available by the site itself. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Mobile Crowdsensing and Crowdsourcing
