Automated Root-Cause Subclassification and No-Code Fix Generation for Invalid Bug Reports
Mahmut Furkan Gon, Emre Dinc, Tevfik Emre Sungur, Eray Tuzun

TL;DR
This paper introduces a standardized taxonomy for classifying invalid bug reports and evaluates various AI approaches for subclassification and no-code fix generation using a curated benchmark.
Contribution
It proposes a new taxonomy for invalid bug report subclasses and systematically compares AI methods for subclassification and fix suggestion accuracy.
Findings
Retrieval Augmented Generation achieves highest subclassification F1 score of 0.66.
Agentic web search attains the highest no-code fix success rate of 68.9%.
Performance varies across subclasses, with Non-reproducibility best detected.
Abstract
Issues faced when using software are reported in the form of bug reports. However, many bug reports are invalid, meaning they do not require code changes, and are resolved with a no-code fix. Manually determining the root cause of the invalid bug reports and providing actionable resolutions by the customer support causes a serious waste of resources. Our goal is to introduce a standardized taxonomy for root-cause oriented invalid bug report subclassification, and perform experiments to test the accuracy of various approaches on invalid subclassification and no-code fix generation. We study how different configurations perform on a gold-standard benchmark we have created. Using a manually curated benchmark for higher quality analysis, we experimented with vanilla LLMs, Retrieval Augmented Generation, and agentic web search to identify invalid subclasses and generate no-code fixes. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
