BDiff: Block-aware and Accurate Text-based Code Differencing
Yao Lu, Wanwei Liu, Tanghaoran Zhang, Kang Yang, Yang Zhang, Wenyu Xu, Longfei Sun, Xinjun Mao, Shuzheng Gao, Michael R. Lyu

TL;DR
BDiff introduces a novel block-aware text differencing algorithm that accurately identifies multi-line edit actions, improving change comprehension in software engineering.
Contribution
It presents a block-aware differencing method that captures multi-line edit actions, outperforming existing tools in result quality and maintaining efficient runtime.
Findings
BDiff produces higher-quality differencing results than baseline tools.
BDiff effectively identifies block-level edit actions in code.
Large language models are unreliable for code differencing in terms of quality and runtime.
Abstract
Code differencing is a fundamental technique in software engineering practice and research. While researchers have proposed text-based differencing techniques capable of identifying line changes over the past decade, existing methods exhibit a notable limitation in identifying edit actions (EAs) that operate on text blocks spanning multiple lines. Such EAs are common in developers' practice, such as moving a code block for conditional branching or duplicating a method definition block for overloading. Existing tools represent such block-level operations as discrete sequences of line-level EAs, compelling developers to manually correlate them and thereby substantially impeding the efficiency of change comprehension. To address this issue, we propose BDiff, a text-based differencing algorithm capable of identifying two types of block-level EAs and five types of line-level EAs. Building on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
