SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

Woojin Lee; Jin-Xia Huang

arXiv:2604.19825·cs.SE·April 23, 2026

SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

Woojin Lee, Jin-Xia Huang

PDF

TL;DR

SolidCoder introduces a new approach to code generation that emphasizes concrete execution over mental simulation, significantly improving correctness and robustness by addressing key limitations of existing models.

Contribution

The paper proposes the S.O.L.I.D. architecture, combining edge-case awareness and sandboxed execution to bridge the mental-reality gap in LLM code generation.

Findings

01

SolidCoder achieves state-of-the-art pass@1 performance on HumanEval (95.7%)

02

Edge-case awareness provides the largest individual performance gain

03

Execution grounding catches errors that specification improvements miss

Abstract

State-of-the-art code generation frameworks rely on mental simulation, where LLMs internally trace execution to verify correctness. We expose a fundamental limitation: the Mental-Reality Gap -- where models hallucinate execution traces and confidently validate buggy code. This gap manifests along two orthogonal dimensions: the Specification Gap (overlooking edge cases during planning) and the Verification Gap (hallucinating correct behavior for flawed code). We propose SolidCoder with a simple principle: don't imagine -- execute. The S.O.L.I.D. architecture addresses both dimensions by forcing edge-case awareness before algorithm design and replacing imagined traces with sandboxed execution using property-based oracles. With GPT-4o, SolidCoder achieves state-of-the-art pass@1 performance: 95.7% on HumanEval (+0.6%p), 77.0% on CodeContests (+4.3%p), and 26.7% on APPS (+3.4%p). Ablation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.