PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection
He Ye, Zimin Chen, Claire Le Goues

TL;DR
PreciseBugCollector is a novel multi-language bug collection framework that combines bug tracking and injection techniques to create a large, precise, and executable bug dataset for software maintenance and deep learning applications.
Contribution
It introduces a dual-component approach with a bug tracker and bug injector, enabling the collection of large-scale, precise, and executable bug datasets across multiple languages.
Findings
Collected over 1 million bugs from open-source projects
Integrated bugs from NVD, OSS-Fuzz, and project-specific injections
Demonstrated the approach's industrial applicability and precision
Abstract
Bug datasets are vital for enabling deep learning techniques to address software maintenance tasks related to bugs. However, existing bug datasets suffer from precise and scale limitations: they are either small-scale but precise with manual validation or large-scale but imprecise with simple commit message processing. In this paper, we introduce PreciseBugCollector, a precise, multi-language bug collection approach that overcomes these two limitations. PreciseBugCollector is based on two novel components: a) A bug tracker to map the codebase repositories with external bug repositories to trace bug type information, and b) A bug injector to generate project-specific bugs by injecting noise into the correct codebases and then executing them against their test suites to obtain test failure messages. We implement PreciseBugCollector against three sources: 1) A bug tracker that links to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
