Syntax Is Not Enough: An Empirical Study of Small Transformer Models for Neural Code Repair

Shaunak Samant

arXiv:2512.22216·cs.SE·December 30, 2025

Syntax Is Not Enough: An Empirical Study of Small Transformer Models for Neural Code Repair

Shaunak Samant

PDF

Open Access

TL;DR

This study evaluates a small transformer model's ability to repair real-world Java bugs, revealing high syntactic correctness but limited semantic accuracy, with many outputs unchanged from the buggy input.

Contribution

It demonstrates that syntactic correctness does not guarantee semantic correctness in neural program repair, highlighting limitations of small models like CodeT5-small.

Findings

01

94% syntactic validity in generated code

02

0% exact match success in repairs

03

80% of outputs reproduce buggy input

Abstract

Automated program repair using neural models has shown promising results on benchmark datasets, yet practical deployment remains limited. In this study, we examine whether a small transformer model can meaningfully repair real-world Java bugs and whether syntactic correctness is a reliable proxy for semantic correctness. We fine-tune CodeT5-small (60.5M parameters) on 52,364 Java bug-fix pairs from CodeXGLUE and evaluate both token-level performance and syntactic validity using AST parsing. While the model converges cleanly and achieves high grammatical correctness, producing syntactically valid Java code in approximately ninety-four percent of cases, it fails to generate correct repairs under exact-match evaluation, achieving zero exact matches. In approximately eighty percent of cases, the model reproduces the buggy input verbatim.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Adversarial Robustness in Machine Learning