Diffusion is a code repair operator and generator

Mukul Singh; Gust Verbruggen; Vu Le; Sumit Gulwani

arXiv:2508.11110·cs.SE·August 18, 2025

Diffusion is a code repair operator and generator

Mukul Singh, Gust Verbruggen, Vu Le, Sumit Gulwani

PDF

TL;DR

This paper explores how code diffusion models can be used as a last-mile repair operator and generator, leveraging their iterative denoising process to fix broken code and generate training data across multiple domains.

Contribution

It introduces the novel idea of using diffusion models for last-mile code repair and data generation, demonstrating practical applications and analyzing their properties.

Findings

01

Diffusion models can effectively perform last-mile code repair.

02

Adding noise and resuming diffusion improves repair quality.

03

Sampling from diffusion models generates useful training data.

Abstract

Code diffusion models generate code by iteratively removing noise from the latent representation of a code snippet. During later steps of the diffusion process, when the code snippet has almost converged, differences between discrete representations of these snippets look like last-mile repairs applied to broken or incomplete code. We evaluate the extent to which this resemblance can be exploited to leverage pre-trained code diffusion models for the problem of last-mile repair by considering two applications with significant potential. First, we can leverage the diffusion model for last-mile repair by adding noise to a broken code snippet and resuming the diffusion process. Second, we can leverage the diffusion model to generate arbitrary amount of training data for last-mile repair tasks (that are computationally more efficient) by sampling an intermediate program (input) and the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.