Consistent or Sensitive? Automated Code Revision Tools Against Semantics-Preserving Perturbations

Shirin Pirouzkhah; Souhaila Serbout; Alberto Bacchelli

arXiv:2602.14595·cs.SE·February 17, 2026

Consistent or Sensitive? Automated Code Revision Tools Against Semantics-Preserving Perturbations

Shirin Pirouzkhah, Souhaila Serbout, Alberto Bacchelli

PDF

Open Access

TL;DR

This study evaluates the consistency of automated code revision tools when faced with semantically equivalent code variants, revealing significant performance drops and highlighting the challenge of maintaining semantic stability in automated code editing.

Contribution

The paper introduces a systematic evaluation of ACR tools' consistency using semantics-preserving perturbations and demonstrates their vulnerability to such perturbations, an area previously underexplored.

Findings

01

ACR tools' revision accuracy drops up to 45.3% with semantically equivalent code.

02

Closer perturbations to targeted regions increase failure likelihood.

03

Attention-guiding heuristics offer only marginal improvements.

Abstract

Automated Code Revision (ACR) tools aim to reduce manual effort by automatically generating code revisions based on reviewer feedback. While ACR tools have shown promising performance on historical data, their real-world utility depends on their ability to handle similar code variants expressing the same issue - a property we define as consistency. However, the probabilistic nature of ACR tools often compromises consistency, which may lead to divergent revisions even for semantically equivalent code variants. In this paper, we investigate the extent to which ACR tools maintain consistency when presented with semantically equivalent code variants. To do so, we first designed nine types of semantics-preserving perturbations (SPP) and applied them to 2032 Java methods from real-world GitHub projects, generating over 10K perturbed variants for evaluation. Then we used these perturbations to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Scientific Computing and Data Management