# BugSwarm: Mining and Continuously Growing a Dataset of Reproducible   Failures and Fixes

**Authors:** David A. Tomassi, Naji Dmeiri, Yichen Wang, Antara Bhowmick, Yen-Chuan, Liu, Premkumar Devanbu, Bogdan Vasilescu, Cindy Rubio-Gonz\'alez

arXiv: 1903.06725 · 2019-07-24

## TL;DR

BugSwarm is a toolset that automatically collects and maintains a large, diverse dataset of real-world software failures and fixes, enabling better evaluation of software quality methods.

## Contribution

It introduces a scalable approach to continuously gather and archive reproducible fail-pass pairs from CI environments, creating a valuable dataset for research.

## Key findings

- Collected 3,091 fail-pass pairs in Java and Python
- Successfully automated the detection and archiving of fail-pass activities
- Dataset is continuously growing and fully reproducible

## Abstract

Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like Travis-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BugSwarm, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.06725/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1903.06725/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1903.06725/full.md

---
Source: https://tomesphere.com/paper/1903.06725