Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Daniel I Jackson, Emma L Jensen, Syed-Amad Hussain, Emre Sezgin

TL;DR
This study adapts a psychometric self-efficacy scale to evaluate large language models' self-assessment capabilities, revealing insights into their confidence, communication style, and limitations in reflecting true performance.
Contribution
It introduces a novel psychometric approach to assess LLMs' self-efficacy, highlighting discrepancies between self-assessment and actual ability across different tasks.
Findings
Self-assessment responses were stable across conditions and repetitions.
Models' self-efficacy levels varied significantly and did not always match their performance.
Higher self-efficacy correlated with more assertive, anthropomorphic reasoning styles.
Abstract
Self-assessment is a key aspect of reliable intelligence, yet evaluations of large language models (LLMs) focus mainly on task accuracy. We adapted the 10-item General Self-Efficacy Scale (GSES) to elicit simulated self-assessments from ten LLMs across four conditions: no task, computational reasoning, social reasoning, and summarization. GSES responses were highly stable across repeated administrations and randomized item orders. However, models showed significantly different self-efficacy levels across conditions, with aggregate scores lower than human norms. All models achieved perfect accuracy on computational and social questions, whereas summarization performance varied widely. Self-assessment did not reliably reflect ability: several low-scoring models performed accurately, while some high-scoring models produced weaker summaries. Follow-up confidence prompts yielded modest,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Mental Health via Writing
