Topic Recommendation for Software Repositories using Multi-label Classification Algorithms
Maliheh Izadi, Abbas Heydarnoori, Georgios Gousios

TL;DR
This paper applies multi-label classification algorithms to predict and recommend relevant GitHub repository topics using textual data, achieving high accuracy and creating an online tool for automatic annotation.
Contribution
The study introduces a multi-label classification approach for automatic repository topic prediction, leveraging textual information and mapping user-defined topics to curated GitHub featured topics.
Findings
Achieved Recall@5 of 0.890 and LRAP of 0.805 in topic prediction
Model effectively recommends complete and accurate topic sets based on user assessments
Developed and released an online tool for automatic repository topic annotation
Abstract
Many platforms exploit collaborative tagging to provide their users with faster and more accurate results while searching or navigating. Tags can communicate different concepts such as the main features, technologies, functionality, and the goal of a software repository. Recently, GitHub has enabled users to annotate repositories with topic tags. It has also provided a set of featured topics, and their possible aliases carefully curated with the help of the community. This creates the opportunity to use this initial seed of topics to automatically annotate all remaining repositories, by training models that recommend high-quality topic tags to developers. In this work, we study the application of multi-label classification techniques to predict software repositories' topics. First, we map the large space of user-defined topics to those featured by GitHub. The core idea is to derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
