TL;DR
This paper investigates how large language models exhibit confidence biases similar to humans and introduces a new self-assessment method, AFCE, to improve their confidence calibration and interpretability.
Contribution
The paper reveals that LLMs show biased confidence patterns and proposes AFCE, a two-stage prompting method, to enhance their confidence accuracy and reduce overconfidence.
Findings
Models exhibit less sensitivity to task difficulty than humans.
AFCE reduces overconfidence in LLMs.
AFCE improves alignment of model confidence with actual accuracy.
Abstract
Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
