Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement
Xiaoqing Zhang, Yuhan Liu, Flood Sung, Xiuying Chen, Shuo Shang, and Rui Yan

TL;DR
ThinkCoder introduces a two-phase code generation framework combining exploration and refinement, enhanced by preference-driven optimization, achieving high accuracy with reduced computational costs on benchmarks like HumanEval and MBPP.
Contribution
The paper presents a novel framework that integrates thorough exploration, optimal refinement, and preference-driven learning to improve code generation efficiency and accuracy.
Findings
Improves Pass@1 by 3.0% over MapCoder with 6.4% of the computation cost.
Achieves higher Pass@1 than AgentCoder after fewer rounds.
Enables LLaMA2-7B to perform competitively using only 20% of resources.
Abstract
Code generation is crucial in software engineering for automating the coding process efficiently. While test-time computation methods show promise, they suffer from high latency due to multiple computation rounds. To overcome this, we introduce \textbf{ThinkCoder}, a framework that combines thorough exploration with optimal refinement. The exploration phase diversifies the solution space by searching for potential solutions, followed by a refinement phase that enhances precision. This approach allows us to select the best solution through careful consideration before taking action, avoiding excessive trial and error. To further minimize test-time computation overhead, we introduce preference-driven optimization with Reinforced Self-Training (ReST), which uses exploration trajectories from ThinkCoder to guide LLM's evolution. This approach enhances LLM's exploration efficiency via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Model-Driven Software Engineering Techniques · Real-time simulation and control systems
MethodsBalanced Selection
