Inference-Time Code Selection via Symbolic Equivalence Partitioning
David Cho, Yifan Wang, Fanping Sui, Ananth Grama

TL;DR
This paper introduces Symbolic Equivalence Partitioning (SEP), a novel inference-time code selection method that leverages symbolic execution and functional equivalence to improve accuracy in code generation tasks.
Contribution
SEP is a new framework that uses symbolic execution and problem examples to better select correct code solutions at inference time, outperforming existing methods.
Findings
SEP improves accuracy from 0.754 to 0.826 on HumanEval+
SEP improves accuracy from 0.565 to 0.647 on LiveCodeBench
SEP does not require auxiliary test generation or learned verifiers.
Abstract
Sampling multiple candidate programs at inference time is an effective way to improve LLM code generation. However, its benefit depends on reliably selecting a correct solution from the generated pool. We observe that this selection problem has a distinctive semantic structure: correct solutions, despite differences in syntax, implementation, or algorithmic strategy, often converge to the same functional behavior over valid inputs. At the same time, consensus alone is not sufficient for correctness, because models can also produce correlated wrong solutions that implement the same mistaken behavior. We propose Symbolic Equivalence Partitioning (SEP), an inference-time selection framework that first uses problem-provided public examples as lightweight validity signals. SEP then uses symbolic execution to partition the remaining candidate programs into bounded functional equivalence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
