Topic Recommendation for Software Repositories using Multi-label   Classification Algorithms

Maliheh Izadi; Abbas Heydarnoori; Georgios Gousios

arXiv:2010.09116·cs.SE·June 15, 2021

Topic Recommendation for Software Repositories using Multi-label Classification Algorithms

Maliheh Izadi, Abbas Heydarnoori, Georgios Gousios

PDF

TL;DR

This paper applies multi-label classification algorithms to predict and recommend relevant GitHub repository topics using textual data, achieving high accuracy and creating an online tool for automatic annotation.

Contribution

The study introduces a multi-label classification approach for automatic repository topic prediction, leveraging textual information and mapping user-defined topics to curated GitHub featured topics.

Findings

01

Achieved Recall@5 of 0.890 and LRAP of 0.805 in topic prediction

02

Model effectively recommends complete and accurate topic sets based on user assessments

03

Developed and released an online tool for automatic repository topic annotation

Abstract

Many platforms exploit collaborative tagging to provide their users with faster and more accurate results while searching or navigating. Tags can communicate different concepts such as the main features, technologies, functionality, and the goal of a software repository. Recently, GitHub has enabled users to annotate repositories with topic tags. It has also provided a set of featured topics, and their possible aliases carefully curated with the help of the community. This creates the opportunity to use this initial seed of topics to automatically annotate all remaining repositories, by training models that recommend high-quality topic tags to developers. In this work, we study the application of multi-label classification techniques to predict software repositories' topics. First, we map the large space of user-defined topics to those featured by GitHub. The core idea is to derive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.