Ensemble-Based Uncertainty Estimation for Code Correctness Estimation

Yunxiang Wei; Tianlin Li; Yuwei Zheng; Yanni Dong; Aishan Liu; Qiang Hu; Xiaoyu Zhang; Mingfei Cheng; Jian Yang

arXiv:2603.27098·cs.SE·April 7, 2026

Ensemble-Based Uncertainty Estimation for Code Correctness Estimation

Yunxiang Wei, Tianlin Li, Yuwei Zheng, Yanni Dong, Aishan Liu, Qiang Hu, Xiaoyu Zhang, Mingfei Cheng, Jian Yang

PDF

TL;DR

This paper introduces Ensemble Semantic Entropy (ESE), a new uncertainty estimation method for code correctness that outperforms single-model approaches and enables efficient, scalable program generation.

Contribution

The paper proposes ESE, an ensemble-based uncertainty measure, and a cascading framework Cas that improves accuracy and reduces computational costs in code correctness estimation.

Findings

01

ESE correlates more strongly with program correctness than single-model entropy.

02

ESE improves prediction accuracy by 53.4% under false-positive constraints.

03

Cas reduces FLOPs by 64.9% while maintaining performance.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in generating programs from natural language descriptions, yet ensuring their correctness without an external oracle remains a critical challenge. To solve the challenge, existing methods often rely on uncertainty estimation, measuring the consistency of semantics or execution behaviors across multiple samples generated by a single model. However, we observe that a single model can often converge to a consistent but incorrect solution, rendering such consistency-based proxies ineffective. To address this, we propose Ensemble Semantic Entropy (ESE), which estimates uncertainty by evaluating the consistency of samples aggregated across an ensemble of models. Experiments on LiveCodeBench demonstrate that ESE correlates more strongly with program correctness than single-model semantic entropy. Notably, in selective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.