# How Different Are Different diff Algorithms in Git?

**Authors:** Yusuf Sulistyo Nugroho, Hideaki Hata, Kenichi Matsumoto

arXiv: 1902.02467 · 2019-10-18

## TL;DR

This paper systematically compares diff algorithms in Git, revealing significant differences in code change detection and recommending the Histogram algorithm for more accurate source code difference analysis.

## Contribution

It provides an empirical evaluation of diff algorithms in Git, highlighting the impact of algorithm choice on code analysis tasks and recommending the Histogram algorithm for better accuracy.

## Key findings

- Different diff algorithms yield 1.7% to 8.2% variation in code change metrics.
- Bug-introducing change detection varies by 6.0% to 13.3% depending on the algorithm.
- Histogram algorithm is more suitable for patch application and source code differences.

## Abstract

Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7% to 8.2% commits based on the different diff algorithms. Regarding bug-introducing change identification, we found 6.0% and 13.3% in the identified bug-fix commits had different results of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.02467/full.md

## Figures

36 figures with captions in the complete paper: https://tomesphere.com/paper/1902.02467/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1902.02467/full.md

---
Source: https://tomesphere.com/paper/1902.02467