Review Beats Planning: Dual-Model Interaction Patterns for Code Synthesis
Jan Miller

TL;DR
This paper demonstrates that reversing the traditional interaction pattern between language models—having the code generator produce code freely and the reasoning model review—significantly improves code synthesis performance, especially with rich specifications.
Contribution
It introduces a novel dual-model interaction pattern where the code generator produces code freely and the reasoning model reviews, leading to superior results over traditional planning approaches.
Findings
Reversing model interaction improves code synthesis performance.
Review effectiveness scales with specification richness.
Achieves 90.2% pass@1 on HumanEval+ surpassing GPT-4o.
Abstract
How should two language models interact to produce better code than either can alone? The conventional approach -- a reasoning model plans, a code specialist implements -- seems natural but fails: on HumanEval+, plan-then-code degrades performance by 2.4 percentage points versus the code specialist alone. We show that reversing the interaction changes everything. When the code specialist generates freely and the reasoning model reviews instead of plans, the same two models on the same hardware achieve 90.2% pass@1 -- exceeding GPT-4o (87.2%) and O1 Preview (89.0%) -- on ~$2/hr of commodity GPU. Cross-benchmark validation across 542 problems (HumanEval+ and MBPP+) reveals a moderating variable: review effectiveness scales with specification richness, yielding 4x more improvement on richly-specified problems (+9.8pp) than on lean ones (+2.3pp), while remaining net-positive in both cases.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Parallel Computing and Optimization Techniques
