Generating Bug-Fixes Using Pretrained Transformers
Dawn Drain, Chen Wu, Alexey Svyatkovskiy, Neel Sundaresan

TL;DR
This paper introduces DeepDebug, a data-driven approach using pretrained transformers to detect and fix bugs in Java code, achieving significant improvements over previous methods by leveraging pretraining and domain adaptation.
Contribution
The work presents a novel sequence-to-sequence model for bug fixing that incorporates pretraining, domain adaptation, and syntax embeddings, outperforming prior state-of-the-art techniques.
Findings
Pretraining improves patch detection by 33%.
Domain-adaptive pretraining adds another 32% accuracy.
Best model generates 75% more non-deletion fixes than previous methods.
Abstract
Detecting and fixing bugs are two of the most important yet frustrating parts of the software development cycle. Existing bug detection tools are based mainly on static analyzers, which rely on mathematical logic and symbolic reasoning about the program execution to detect common types of bugs. Fixing bugs is typically left out to the developer. In this work we introduce DeepDebug: a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub repositories. We frame bug-patching as a sequence-to-sequence learning task consisting of two steps: (i) denoising pretraining, and (ii) supervised finetuning on the target translation task. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch, while domain-adaptive pretraining from natural language to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRepair
