CoditT5: Pretraining for Source Code and Natural Language Editing
Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Junyi Jessy Li, Milos, Gligoric

TL;DR
CoditT5 introduces a novel pretraining approach explicitly modeling edits, enabling improved performance on software editing tasks like bug fixing and code review, outperforming standard models and achieving state-of-the-art results.
Contribution
The paper presents CoditT5, a large language model pretrained with an edit-specific objective for software editing tasks, a novel approach compared to traditional generation models.
Findings
Outperforms standard generation models on editing tasks
Achieves state-of-the-art results with reranking strategies
Demonstrates the effectiveness of edit-focused pretraining for code and language editing
Abstract
Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling
