Classifying Issues in Open-source GitHub Repositories
Amir Hossain Raaj, Fairuz Nawer Meem, Sadia Afrin Mim

TL;DR
This paper explores the use of machine learning and deep neural network models to classify issues in open-source GitHub repositories, aiming to improve issue management and labeling consistency.
Contribution
It introduces a novel application of ML and DNN models for automated issue classification in open-source projects, addressing inconsistent labeling practices.
Findings
DNN models outperform traditional ML methods in accuracy.
Certain labels like Bug and Documentation are classified with high precision.
Automated classification can significantly expedite issue triaging.
Abstract
GitHub is the most widely used platform for software maintenance in the open-source community. Developers report issues on GitHub from time to time while facing difficulties. Having labels on those issues can help developers easily address those issues with prior knowledge of labels. However, most of the GitHub repositories do not maintain regular labeling for the issues. The goal of this work is to classify issues in the open-source community using ML \& DNN models. There are thousands of open-source repositories on GitHub. Some of the repositories label their issues properly whereas some of them do not. When issues are pre-labeled, the problem-solving process and the immediate assignment of corresponding personnel are facilitated for the team, thereby expediting the development process. In this work, we conducted an analysis of prominent GitHub open-source repositories. We classified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Open Source Software Innovations
