GitRanking: A Ranking of GitHub Topics for Software Classification using Active Sampling
Cezar Sas, Andrea Capiluppi, Claudio Di Sipio, Juri Di Rocco, Davide, Di Ruscio

TL;DR
GitRanking introduces a hierarchical, knowledge-based ranking system for GitHub topics, utilizing active sampling and Wikidata links to improve software classification and aid developers in better annotating projects.
Contribution
It presents a novel, extensible framework for ranking software topics by generality, integrating active sampling and Wikidata to create a hierarchical taxonomy grounded in a knowledge base.
Findings
Developed a ranked taxonomy of GitHub topics based on their generality.
Demonstrated that developers tend to avoid highly specific terms in annotations.
Showed GitRanking's effectiveness and extensibility with minimal annotations.
Abstract
GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various proposals for software application domain classification over the past years. However, these approaches lack a well-defined taxonomy that is hierarchical, grounded in a knowledge base, and free of irrelevant terms. This work proposes GitRanking, a framework for creating a classification ranked into discrete levels based on how general or specific their meaning is. We collected 121K topics from GitHub and considered of the most frequent ones for the ranking. GitRanking 1) uses active sampling to ensure a minimal number of required annotations; and 2) links each topic to Wikidata, reducing ambiguities and improving the reusability of the taxonomy. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Wikis in Education and Collaboration
