# Continuous Defect Prediction: The Idea and a Related Dataset

**Authors:** Lech Madeyski, Marcin Kawalerowicz

arXiv: 1703.04142 · 2017-06-23

## TL;DR

This paper introduces the concept of Continuous Defect Prediction (CDP), presents a large dataset derived from CI build data and software repositories, and discusses its potential for predicting risky software changes.

## Contribution

The paper provides a new dataset with over 11 million data points from CI builds and software repositories, enabling research in continuous defect prediction.

## Key findings

- Dataset includes 1265 projects and 30,022 authors.
- Data synthesizes CI build results with repository information.
- Potential features for predicting risky commits are identified.

## Abstract

We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data rows, representing files involved in Continuous Integration (CI) builds, that synthesize the results of CI builds with data we mine from software repositories. Our dataset embraces 1265 software projects, 30,022 distinct commit authors and several software process metrics that in earlier research appeared to be useful in software defect prediction. In this particular dataset we use TravisTorrent as the source of CI data. TravisTorrent synthesizes commit level information from the Travis CI server and GitHub open-source projects repositories. We extend this data to a file change level and calculate the software process metrics that may be used, for example, as features to predict risky software changes that could break the build if committed to a repository with CI enabled.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.04142/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1703.04142/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/1703.04142/full.md

---
Source: https://tomesphere.com/paper/1703.04142