Ensemble-Based Uncertainty Estimation for Code Correctness Estimation
Yunxiang Wei, Tianlin Li, Yuwei Zheng, Yanni Dong, Aishan Liu, Qiang Hu, Xiaoyu Zhang, Mingfei Cheng, Jian Yang

TL;DR
This paper introduces Ensemble Semantic Entropy (ESE), a new uncertainty estimation method for code correctness that outperforms single-model approaches and enables efficient, scalable program generation.
Contribution
The paper proposes ESE, an ensemble-based uncertainty measure, and a cascading framework Cas that improves accuracy and reduces computational costs in code correctness estimation.
Findings
ESE correlates more strongly with program correctness than single-model entropy.
ESE improves prediction accuracy by 53.4% under false-positive constraints.
Cas reduces FLOPs by 64.9% while maintaining performance.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in generating programs from natural language descriptions, yet ensuring their correctness without an external oracle remains a critical challenge. To solve the challenge, existing methods often rely on uncertainty estimation, measuring the consistency of semantics or execution behaviors across multiple samples generated by a single model. However, we observe that a single model can often converge to a consistent but incorrect solution, rendering such consistency-based proxies ineffective. To address this, we propose Ensemble Semantic Entropy (ESE), which estimates uncertainty by evaluating the consistency of samples aggregated across an ensemble of models. Experiments on LiveCodeBench demonstrate that ESE correlates more strongly with program correctness than single-model semantic entropy. Notably, in selective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
