A Universal Textual Merge Strategy Based on Tokens for Version Control Systems
Qiqi Jason Gu, Mikol\'a\v{s} Janota

TL;DR
Summer is a novel token-based merge algorithm for version control that reduces conflicts and improves merge accuracy across diverse file types without relying on language-specific parsers.
Contribution
It introduces a language-agnostic, token-level merging approach that models code refactorings and parallel edits more effectively than traditional line-based methods.
Findings
Summer achieved 36% accuracy in reproducing developer merges verbatim.
It ranked second in semantic accuracy among five merge tools.
The approach is effective across Java and non-Java files.
Abstract
Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
