A Simple NLP-based Approach to Support Onboarding and Retention in Open Source Communities
Christoph Stanik, Lloyd Montgomery, Daniel Martens, Davide Fucci and, Walid Maalej

TL;DR
This paper presents a simple NLP-based method to automatically identify issues suitable for newcomers and those likely to be resolved by future active developers in open source projects, enhancing onboarding and retention.
Contribution
It introduces a straightforward NLP approach using supervised classifiers to identify beginner-friendly and future active developer issues from issue descriptions.
Findings
Random Forest achieved 91% precision for identifying beginner issues.
Decision Tree achieved 92% precision for issues resolved by future active developers.
The approach enables automatic issue labeling for onboarding support.
Abstract
Successful open source communities are constantly looking for new members and helping them become active developers. A common approach for developer onboarding in open source projects is to let newcomers focus on relevant yet easy-to-solve issues to familiarize themselves with the code and the community. The goal of this research is twofold. First, we aim at automatically identifying issues that newcomers can resolve by analyzing the history of resolved issues by simply using the title and description of issues. Second, we aim at automatically identifying issues, that can be resolved by newcomers who later become active developers. We mined the issue trackers of three large open source projects and extracted natural language features from the title and description of resolved issues. In a series of experiments, we optimized and compared the accuracy of four supervised classifiers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
