Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation

Arindam Sharma; Cristina David

arXiv:2502.11620·cs.SE·July 2, 2025

Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation

Arindam Sharma, Cristina David

PDF

Open Access

TL;DR

This paper investigates using uncertainty estimation methods, adapted from natural language processing, to assess and improve the correctness of code generated by large language models, incorporating semantic checks for better accuracy.

Contribution

It introduces adapted uncertainty estimation techniques for code generation, including a semantic equivalence check, and demonstrates their effectiveness in reducing errors through an abstention policy.

Findings

01

Strong correlation between uncertainty and correctness

02

Simplified entropy method performs comparably to complex methods

03

Abstention policy significantly reduces incorrect outputs

Abstract

In this work, we explore uncertainty estimation as a proxy for correctness in LLM-generated code. To this end, we adapt two state-of-the-art techniques from natural language generation -- one based on entropy and another on mutual information -- to the domain of code generation. Given the distinct semantic properties of code, we introduce modifications, including a semantic equivalence check based on symbolic execution. Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness, highlighting the potential of uncertainty estimation for quality assessment. Additionally, we propose a simplified version of the entropy-based method that assumes a uniform distribution over the LLM's responses, demonstrating comparable effectiveness. Using these techniques, we develop an abstention policy that prevents the model from making predictions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Software Engineering Research · Software Testing and Debugging Techniques