Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
Sree Bhattacharyya, Samarth Khanna, Leona Chen, Lucas Craig, Tharun Dilliraj, James Z. Wang

TL;DR
This paper introduces a multidimensional self-assessment framework for LLMs, demonstrating that appraisal-based measures like effort and ability outperform confidence in predicting model failure across diverse tasks.
Contribution
It proposes a novel multidimensional self-assessment approach based on cognitive appraisal theory, enhancing reliability of performance prediction in LLMs.
Findings
Competence-related appraisals like effort and ability outperform confidence in failure prediction.
Effort provides stable, less overoptimistic estimates across model sizes.
Task characteristics influence which appraisal dimension is most predictive.
Abstract
Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, however, has been shown to be an inconsistent and overoptimistic predictor of model correctness. Drawing on cognitive appraisal theory, a framework from human psychology that decomposes self-evaluation into multiple components, we propose a multidimensional perspective on model self-assessment. We elicit six appraisal-based dimensions of self-assessment, alongside confidence, and evaluate their utility for predicting model failure across 12 LLMs and 38 tasks spanning eight domains. We find that competence-related appraisal dimensions, particularly effort and ability, consistently match or outperform confidence across most settings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
