Bug or Not? Bug Report Classification Using N-Gram IDF
Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta, Kenichi, Matsumoto

TL;DR
This paper introduces N-gram IDF, a novel feature extraction method for classifying bug reports, demonstrating improved accuracy over topic-based models using machine learning techniques.
Contribution
The paper presents N-gram IDF, an extension of IDF for extracting variable-length key terms, enhancing bug report classification performance.
Findings
N-gram IDF-based models outperform topic-based models in bug report classification.
Models achieve superior accuracy across all evaluated cases.
Potential for extending N-gram IDF to other software engineering tasks.
Abstract
Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques
