Break-It-Fix-It: Unsupervised Learning for Program Repair
Michihiro Yasunaga, Percy Liang

TL;DR
This paper introduces Break-It-Fix-It (BIFI), an unsupervised learning approach for program repair that iteratively improves a fixer and a breaker to generate realistic training data, outperforming existing methods without labeled data.
Contribution
The paper proposes a novel unsupervised training framework, BIFI, that jointly trains a breaker and a fixer to enhance program repair accuracy without labeled datasets.
Findings
Achieved 90.5% repair accuracy on GitHub-Python (+28.5%)
Achieved 71.7% repair accuracy on DeepFix (+5.6%)
Outperforms existing methods in unsupervised code repair
Abstract
We consider repair tasks: given a critic (e.g., compiler) that assesses the quality of an input, the goal is to train a fixer that converts a bad example (e.g., code with syntax errors) into a good one (e.g., code with no syntax errors). Existing works create training data consisting of (bad, good) pairs by corrupting good examples using heuristics (e.g., dropping tokens). However, fixers trained on this synthetically-generated data do not extrapolate well to the real distribution of bad inputs. To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code. Based on these ideas, we iteratively update the breaker and the fixer while using them in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Machine Learning and Data Classification
MethodsRepair · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections
