# An Approach and Benchmark to Detect Behavioral Changes of Commits in   Continuous Integration

**Authors:** Benjamin Danglot, Martin Monperrus, Walter Rudametkin, Benoit, Baudry

arXiv: 1902.08482 · 2023-05-05

## TL;DR

This paper introduces DCI, an automated approach that detects behavioral changes in code commits by generating test methods, and provides a new benchmark dataset of real-world behavioral changes in open source Java projects.

## Contribution

The paper presents DCI, a novel automated method for detecting behavioral changes in commits using assertion amplification and input space exploration, along with a curated dataset of real-world changes.

## Key findings

- DCI successfully detects behavioral changes in 15.29% of commits in the benchmark.
- The approach can be integrated into existing development workflows.
- It works effectively on commits with existing unit tests that cover the modified code.

## Abstract

When a developer pushes a change to an application's codebase, a good practice is to have a test case specifying this behavioral change. Thanks to continuous integration (CI), the test is run on subsequent commits to check that they do no introduce a regression for that behavior. In this paper, we propose an approach that detects behavioral changes in commits. As input, it takes a program, its test suite, and a commit. Its output is a set of test methods that capture the behavioral difference between the pre-commit and post-commit versions of the program. We call our approach DCI (Detecting behavioral changes in CI). It works by generating variations of the existing test cases through (i) assertion amplification and (ii) a search-based exploration of the input space. We evaluate our approach on a curated set of 60 commits from 6 open source Java projects. To our knowledge, this is the first ever curated dataset of real-world behavioral changes. Our evaluation shows that DCI is able to generate test methods that detect behavioral changes. Our approach is fully automated and can be integrated into current development processes. The main limitations are that it targets unit tests and works on a relatively small fraction of commits. More specifically, DCI works on commits that have a unit test that already executes the modified code. In practice, from our benchmark projects, we found 15.29% of commits to meet the conditions required by DCI.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08482/full.md

---
Source: https://tomesphere.com/paper/1902.08482