TL;DR
BugScope is a novel framework that mimics human bug detection learning, significantly improving accuracy over existing tools by structuring the process into three aligned steps with LLMs.
Contribution
Introduces BugScope, a three-step bug detection framework that aligns large language models with human-like bug learning, achieving high precision and recall on real-world datasets.
Findings
Achieves 86.05% precision and 87.88% recall on real-world bugs.
Outperforms industrial tools like Claude Code and Cursor BugBot in F1 score.
Discovered 184 previously unknown bugs in the Linux kernel, with 78 fixed and 7 confirmed.
Abstract
Software auditing is an increasingly critical task in the era of rapid code generation. While LLM-based auditors have demonstrated strong potential, their effectiveness remains limited by misalignment with the highly complex, domain-specific nature of bug detection. In this work, we introduce BugScope, a framework that mirrors how human auditors learn specific bug patterns from representative examples and apply this knowledge during code auditing. BugScope structures auditing into three steps: seed identification, context retrieval, and bug detection, and aligns LLMs to each step by analyzing real bug reports and mutated examples, and distilling concise, reusable guidelines. On a curated dataset of 33 real-world bugs from 21 widely used open-source projects, BugScope achieves 86.05\% precision and 87.88\% recall, corresponding to an F1 score of 0.87. By comparison, leading industrial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
