RegMiner: Towards Constructing a Large Regression Dataset from Code Evolution History
Xuezhi Song, Yun Lin, Siang Hwee Ng, Yijian Wu, Xin Peng, Jin Song, Dong, Hong Mei

TL;DR
This paper introduces RegMiner, an automated approach to harvest large-scale, replicable regression bugs from code evolution history, enabling more scalable and representative datasets for software engineering research.
Contribution
The paper presents RegMiner, a novel tool that automates the collection of regression bugs from version histories, creating the largest dataset of its kind in a short period.
Findings
Harvested 537 regressions from 66 projects in 3 weeks
Identified gaps between fault localization techniques and actual fixes
Demonstrated the scalability and effectiveness of automated regression bug collection
Abstract
Bug datasets consisting of real-world bugs are important artifacts for researchers and programmers, which lay empirical and experimental foundation for various SE/PL research such as fault localization, software testing, and program repair. All known state-of-the-art datasets are constructed manually, which inevitably limits their scalability, representativeness, and the support for the emerging data-driven research. In this work, we propose an approach to automate the process of harvesting replicable regression bugs from the code evolutionary history. We focus on regression bug dataset, as they (1) manifest how a bug is introduced and fixed (as normal bugs), (2) support regression bug analysis, and (3) incorporate a much stronger specification (i.e., the original passing version) for general bug analysis. Technically, we address an information retrieval problem on code evolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
