SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution
Woojin Lee, Jin-Xia Huang

TL;DR
SolidCoder introduces a new approach to code generation that emphasizes concrete execution over mental simulation, significantly improving correctness and robustness by addressing key limitations of existing models.
Contribution
The paper proposes the S.O.L.I.D. architecture, combining edge-case awareness and sandboxed execution to bridge the mental-reality gap in LLM code generation.
Findings
SolidCoder achieves state-of-the-art pass@1 performance on HumanEval (95.7%)
Edge-case awareness provides the largest individual performance gain
Execution grounding catches errors that specification improvements miss
Abstract
State-of-the-art code generation frameworks rely on mental simulation, where LLMs internally trace execution to verify correctness. We expose a fundamental limitation: the Mental-Reality Gap -- where models hallucinate execution traces and confidently validate buggy code. This gap manifests along two orthogonal dimensions: the Specification Gap (overlooking edge cases during planning) and the Verification Gap (hallucinating correct behavior for flawed code). We propose SolidCoder with a simple principle: don't imagine -- execute. The S.O.L.I.D. architecture addresses both dimensions by forcing edge-case awareness before algorithm design and replacing imagined traces with sandboxed execution using property-based oracles. With GPT-4o, SolidCoder achieves state-of-the-art pass@1 performance: 95.7% on HumanEval (+0.6%p), 77.0% on CodeContests (+4.3%p), and 26.7% on APPS (+3.4%p). Ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
