Diffusion is a code repair operator and generator
Mukul Singh, Gust Verbruggen, Vu Le, Sumit Gulwani

TL;DR
This paper explores how code diffusion models can be used as a last-mile repair operator and generator, leveraging their iterative denoising process to fix broken code and generate training data across multiple domains.
Contribution
It introduces the novel idea of using diffusion models for last-mile code repair and data generation, demonstrating practical applications and analyzing their properties.
Findings
Diffusion models can effectively perform last-mile code repair.
Adding noise and resuming diffusion improves repair quality.
Sampling from diffusion models generates useful training data.
Abstract
Code diffusion models generate code by iteratively removing noise from the latent representation of a code snippet. During later steps of the diffusion process, when the code snippet has almost converged, differences between discrete representations of these snippets look like last-mile repairs applied to broken or incomplete code. We evaluate the extent to which this resemblance can be exploited to leverage pre-trained code diffusion models for the problem of last-mile repair by considering two applications with significant potential. First, we can leverage the diffusion model for last-mile repair by adding noise to a broken code snippet and resuming the diffusion process. Second, we can leverage the diffusion model to generate arbitrary amount of training data for last-mile repair tasks (that are computationally more efficient) by sampling an intermediate program (input) and the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
