ZeroCoder: Can LLMs Improve Code Generation Without Ground-Truth Supervision?
Lishui Fan, Mouxiang Chen, Tingwei Zhu, Kui Liu, Xin Xia, Shanping Li, Zhongxin Liu

TL;DR
ZeroCoder is a novel co-evolutionary framework that enhances code and test generation without ground-truth supervision by leveraging execution feedback and dynamic calibration.
Contribution
It introduces a fully label-free co-evolutionary approach with a Bayesian selector to improve code and test generation performance.
Findings
ZeroCoder improves code generation by up to 14.5% over the base model.
With DyB4, code generation gains reach 21.6%.
Test generation improves by 24.3%, nearing supervised performance.
Abstract
Code generation is important in software engineering, and Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm to improve it through execution-based feedback. However, most RLVR pipelines rely on human-curated tests, making progress bottlenecked by scarce and costly supervision. Existing work tried to use self-generated tests to ground rewards, but the lack of discriminative tests constrains the effect due to the sub-optimal performance of the model on test generation. We aim to improve code generation without ground-truth supervision by co-evolving code and test generation, so that their interactions yield progressively more informative supervision. To this end, we present ZeroCoder, a fully label-free co-evolutionary framework that jointly trains a Coder and a Tester using execution feedback from self-generated code-test interactions. For each problem,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
