Task Abstention for Large Language Models in Code Generation
Yanke Zhou, Yuhao Tan, Senrong Xu, Zenan Li, Yuan Yao, Taolue Chen, Xiaoxing Ma

TL;DR
This paper introduces a calibrated abstention method for large language models in code generation, enabling models to abstain from tasks likely to produce hallucinated, incorrect code, thereby improving safety and reliability.
Contribution
It presents a novel abstention rule based on multiple hypothesis testing that offers theoretical guarantees and improves detection of hallucinations without external data.
Findings
The method improves hallucination detection accuracy in code generation.
It provides distribution-free theoretical guarantees on abstention decisions.
Experimental results outperform existing techniques in abstaining from incorrect outputs.
Abstract
Large language models (LLMs) have revolutionized automated code generation. One serious concern, however, is the so-called ``hallucination'', i.e., LLMs may generate seemingly plausible but functionally incorrect code. In this paper, we study the task abstention problem, i.e., determining whether a given LLM should abstain from performing a specific code generation task to avoid likely hallucination. Our approach features a calibrated abstention rule, grounded in the principles of multiple hypothesis testing. The rule assesses generation consistency through code execution outcomes, allowing it to handle syntactic diversity of semantically equivalent code without reliance on oracle test cases or external databases. We prove that our approach provides a rigorous, distribution-free theoretical guarantee on its abstention decisions. We evaluate our method on benchmark datasets using several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
