Auto-SPT: Automating Semantic Preserving Transformations for Code

Ashish Hooda; Mihai Christodorescu; Chuangang Ren; Aaron Wilson; Kassem Fawaz; Somesh Jha

arXiv:2512.06042·cs.SE·December 9, 2025

Auto-SPT: Automating Semantic Preserving Transformations for Code

Ashish Hooda, Mihai Christodorescu, Chuangang Ren, Aaron Wilson, Kassem Fawaz, Somesh Jha

PDF

Open Access 3 Reviews

TL;DR

Auto-SPT is a framework that automatically generates diverse, semantic-preserving code transformations to improve the robustness of code clone detection models against real-world code variations.

Contribution

It introduces Auto-SPT, a novel LLM-based framework for creating diverse semantic-preserving transformations to enhance code clone detection robustness.

Findings

01

Auto-SPT produces more diverse transformations than existing methods.

02

Transformations generated by Auto-SPT significantly reduce clone detector performance.

03

Auto-SPT can improve training datasets for more robust code clone detection.

Abstract

Machine learning (ML) models for code clone detection determine whether two pieces of code are semantically equivalent, which in turn is a key building block for software-engineering tasks like refactoring and security tasks like vulnerability and malware detection. While these models are predominantly trained on clean, structured code datasets, real-world code often undergoes a variety of semantic-preserving transformations, including refactoring, minification, automated formatting, and compiler optimizations. To address this critical gap between training and test data, we propose Auto-SPT, a novel framework to automatically construct synthetic-data generators for code. Auto-SPT is designed to produce Semantic Preserving Transformations (SPTs) that alter a program's syntactic structure while preserving its functionality and is instantiated on top of Large Language Models (LLMs). In…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

- The correctness of the generated transformations is validated through corresponding unit tests, which enhances the reliability of the proposed SPTs. - The framework is largely automated and demonstrates good scalability for producing diverse code transformations.

Weaknesses

- The automation in Auto-SPT primarily relies on prompting LLMs to design and implement SPTs. While practical, this approach offers limited methodological novelty given the growing application of LLM-based automation. - The framework currently supports only Python and focuses on function-level transformations, while file- or project-level perturbations would better reflect real-world code evolution. - The paper does not evaluate whether the transformed programs remain natural and consistent w

Reviewer 02Rating 2Confidence 3

Strengths

1. Clone detection is an important problem in software engineering, and exploring distribution shift after model deployment is valuable. 2. The paper achieves automatic discovery of new code transformations using LLMs. 3. The results show that LLM-based data augmentation can generate realistic cases that exist in practice but are missing in training data.

Weaknesses

Method: 1. Lines 207–209 mention that naively prompting the LLMs does not work due to lack of randomness and hallucinations. The authors propose a new prompt design, but it is not clearly explained why the proposed prompt can address these two issues, nor is there empirical evidence demonstrating that the prompt resolves them. 2. Verifying whether the transformations can be correctly applied is critical (one of the limitations of prior work mentioned in the introduction). In Section 4.2, the au

Reviewer 03Rating 2Confidence 4

Strengths

- The paper provides a formal analysis linking SPT diversity to transformation strength, with empirical validation on real datasets.

Weaknesses

The novelty is limited. Previously work, such as (Zhang et al., 2023), already described semantic-preserving code transformations for evaluating machine learning-based code clone detection models. That paper also presented many semantic preserving transformations as well as their combinations. This paper only briefly mentioned (Zhang et al., 2023) but did not compare with it. Also, there are some recent work for improving the robustness of the clone detection model, which can be compared with t

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques