Directional Diffusion-Style Code Editing Pre-training

Qingyuan Liang; Zeyu Sun; Qihao Zhu; Junhao Hu; Yifan Zhao; Yizhou Chen; Mingxuan Zhu; Guoqing Wang; Lu Zhang

arXiv:2501.12079·cs.SE·December 11, 2025

Directional Diffusion-Style Code Editing Pre-training

Qingyuan Liang, Zeyu Sun, Qihao Zhu, Junhao Hu, Yifan Zhao, Yizhou Chen, Mingxuan Zhu, Guoqing Wang, Lu Zhang

PDF

Open Access 2 Models

TL;DR

DivoT5 introduces a diffusion-based pre-training approach that models the step-by-step code editing process, leading to state-of-the-art performance in various code editing tasks.

Contribution

The paper proposes DivoT5, a novel diffusion-style pre-training method that incorporates code evolution dynamics into model training for improved code editing capabilities.

Findings

01

DivoT5 achieves SOTA results on multiple code editing tasks.

02

Pre-training with diffusion direction enhances model understanding of code evolution.

03

DivoT5 outperforms larger models in few-shot and fine-tuning scenarios.

Abstract

Code pre-trained models have shown promising effectiveness in various software engineering tasks. Among these tasks, many tasks are related to software evolution and/or code editing. However, existing code pre-trained models often overlook the real-world code editing data and the evolutionary nature of the editing process. In this paper, to simulate the step-by-step code editing process of human developers, we propose DivoT5, a pre-trained model based on directional diffusion at the data level. In DivoT5, we adopt two categories of pre-training tasks. The first category is mask and denoising tasks augmented with a diffusion direction representing code evolution. That is, we first apply a noising process to the code snippets before evolution, and then ask the pre-training process to restore the snippets with noise into the code snippets after evolution. The second category is tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques