Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation
Arindam Sharma, Cristina David

TL;DR
This paper investigates using uncertainty estimation methods, adapted from natural language processing, to assess and improve the correctness of code generated by large language models, incorporating semantic checks for better accuracy.
Contribution
It introduces adapted uncertainty estimation techniques for code generation, including a semantic equivalence check, and demonstrates their effectiveness in reducing errors through an abstention policy.
Findings
Strong correlation between uncertainty and correctness
Simplified entropy method performs comparably to complex methods
Abstention policy significantly reduces incorrect outputs
Abstract
In this work, we explore uncertainty estimation as a proxy for correctness in LLM-generated code. To this end, we adapt two state-of-the-art techniques from natural language generation -- one based on entropy and another on mutual information -- to the domain of code generation. Given the distinct semantic properties of code, we introduce modifications, including a semantic equivalence check based on symbolic execution. Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness, highlighting the potential of uncertainty estimation for quality assessment. Additionally, we propose a simplified version of the entropy-based method that assumes a uniform distribution over the LLM's responses, demonstrating comparable effectiveness. Using these techniques, we develop an abstention policy that prevents the model from making predictions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Software Testing and Debugging Techniques
