DOCE: Finding the Sweet Spot for Execution-Based Code Generation
Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, Andr\'e F.T. Martins

TL;DR
This paper introduces DOCE, a comprehensive framework for execution-based code generation that compares various decoding and reranking methods, emphasizing execution-based evaluation and self-debugging to improve performance.
Contribution
The paper proposes a unified framework for execution-based code generation, systematically evaluating different components and introducing self-debugging for state-of-the-art reranking results.
Findings
Execution-based methods outperform execution-free approaches.
Filtering with trial unit tests significantly improves code quality.
Self-debugging on multiple candidates achieves state-of-the-art reranking performance.
Abstract
Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by proposing Decoding Objectives for Code Execution, a comprehensive framework that includes candidate generation, -best reranking, minimum Bayes risk (MBR) decoding, and self-debugging as the core components. We then study the contributions of these components through execution-based evaluation metrics. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods. Furthermore, we assess the impact of filtering based on trial unit tests, a simple and effective strategy that has been often overlooked in prior works. We also propose self-debugging on multiple candidates,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Testing and Debugging Techniques · Security and Verification in Computing
MethodsSparse Evolutionary Training
