Auto-labelling of Bug Report using Natural Language Processing
Avinash Patil, Aryan Jadon

TL;DR
This paper presents an NLP-based method for automatically labeling bug reports by leveraging report attributes and deep learning, significantly improving duplicate detection accuracy in bug tracking systems.
Contribution
It introduces a novel combination of NLP techniques, a custom data transformer, and a deep neural network for more accurate duplicate bug report retrieval.
Findings
Achieves 70% recall@5 in duplicate bug report retrieval
Effectively utilizes structured and unstructured bug report attributes
Demonstrates high accuracy on large bug report datasets
Abstract
The exercise of detecting similar bug reports in bug tracking systems is known as duplicate bug report detection. Having prior knowledge of a bug report's existence reduces efforts put into debugging problems and identifying the root cause. Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking. In addition, triage engineers are less motivated to spend time going through an extensive list. Consequently, this deters the use of duplicate bug report retrieval solutions. In this paper, we have proposed a solution using a combination of NLP techniques. Our approach considers unstructured and structured attributes of a bug report like summary, description and severity, impacted products, platforms, categories, etc. It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Data Mining and Analysis
