Analyzing and Evaluating the Behavior of Git Diff and Merge
Niels Glodny

TL;DR
This paper investigates the underlying algorithms of Git's diff and merge functionalities, revealing unexpected behaviors and limitations that impact collaboration and complex operations.
Contribution
It provides a detailed analysis of Git's diff and merge algorithms, highlighting their pathological cases and non-intuitive behaviors, which were previously not well documented.
Findings
Histogram diff can cause entire files to be marked as changed due to single-line edits.
Default merge strategy can have exponential time complexity in certain histories.
Merges and rebases are not commutative, affecting reproducibility.
Abstract
Despite being widely used, the algorithms that enable collaboration with Git are not well understood. The diff and merge algorithms are particularly interesting, as they could be applied in other contexts. In this thesis, I document the main functionalities of Git: how diffs are computed, how they are used to run merges, and how merges enable more complex operations. In the process, I show multiple unexpected behaviors in Git, including the following: The histogram diff algorithm has pathological cases where a single-line change can cause the entire rest of the file to be marked as changed. The default merge strategy (ort) can result in merges requiring exponential time in the number of commits in the history. Merges and rebases are not commutative, and even when merges do not result in a conflict, the result is not specified but depends on the diff algorithm used. And finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
