Prioritising GitHub Priority Labels
James Caddy, Christoph Treude

TL;DR
This paper introduces a dataset of 812 manually categorized GitHub issue labels by priority and provides a tool to help contributors identify high-priority issues across repositories, aiding open source issue triaging.
Contribution
The paper presents a novel dataset of priority labels and a tool to assist in identifying high-priority issues, addressing the lack of standardization in GitHub issue labeling.
Findings
Created a dataset of 812 priority labels categorized as low, medium, or high.
Developed a tool to identify high-priority issues across repositories.
Released dataset and tool publicly to support open source community efforts.
Abstract
Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
