Can GitHub Issues Help in App Review Classifications?
Yasaman Abedini, Abbas Heydarnoori

TL;DR
This paper proposes a novel method to enhance app review classification by augmenting datasets with information extracted from GitHub issues, leading to improved machine learning model performance.
Contribution
It introduces a new approach to augment labeled datasets using GitHub issues, significantly improving classification accuracy for app reviews.
Findings
F1-score increased by 6.3 for bug reports
F1-score increased by 7.2 for feature requests
Optimal auxiliary volume range identified as 0.3 to 0.7
Abstract
App reviews reflect various user requirements that can aid in planning maintenance tasks. Recently, proposed approaches for automatically classifying user reviews rely on machine learning algorithms. A previous study demonstrated that models trained on existing labeled datasets exhibit poor performance when predicting new ones. Therefore, a comprehensive labeled dataset is essential to train a more precise model. In this paper, we propose a novel approach that assists in augmenting labeled datasets by utilizing information extracted from an additional source, GitHub issues, that contains valuable information about user requirements. First, we identify issues concerning review intentions (bug reports, feature requests, and others) by examining the issue labels. Then, we analyze issue bodies and define 19 language patterns for extracting targeted information. Finally, we augment the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Open Source Software Innovations
