Syntax Is Not Enough: An Empirical Study of Small Transformer Models for Neural Code Repair
Shaunak Samant

TL;DR
This study evaluates a small transformer model's ability to repair real-world Java bugs, revealing high syntactic correctness but limited semantic accuracy, with many outputs unchanged from the buggy input.
Contribution
It demonstrates that syntactic correctness does not guarantee semantic correctness in neural program repair, highlighting limitations of small models like CodeT5-small.
Findings
94% syntactic validity in generated code
0% exact match success in repairs
80% of outputs reproduce buggy input
Abstract
Automated program repair using neural models has shown promising results on benchmark datasets, yet practical deployment remains limited. In this study, we examine whether a small transformer model can meaningfully repair real-world Java bugs and whether syntactic correctness is a reliable proxy for semantic correctness. We fine-tune CodeT5-small (60.5M parameters) on 52,364 Java bug-fix pairs from CodeXGLUE and evaluate both token-level performance and syntactic validity using AST parsing. While the model converges cleanly and achieves high grammatical correctness, producing syntactically valid Java code in approximately ninety-four percent of cases, it fails to generate correct repairs under exact-match evaluation, achieving zero exact matches. In approximately eighty percent of cases, the model reproduces the buggy input verbatim.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Adversarial Robustness in Machine Learning
