Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy

Daniel I Jackson; Emma L Jensen; Syed-Amad Hussain; Emre Sezgin

arXiv:2511.19872·cs.AI·November 27, 2025

Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy

Daniel I Jackson, Emma L Jensen, Syed-Amad Hussain, Emre Sezgin

PDF

Open Access

TL;DR

This study adapts a psychometric self-efficacy scale to evaluate large language models' self-assessment capabilities, revealing insights into their confidence, communication style, and limitations in reflecting true performance.

Contribution

It introduces a novel psychometric approach to assess LLMs' self-efficacy, highlighting discrepancies between self-assessment and actual ability across different tasks.

Findings

01

Self-assessment responses were stable across conditions and repetitions.

02

Models' self-efficacy levels varied significantly and did not always match their performance.

03

Higher self-efficacy correlated with more assertive, anthropomorphic reasoning styles.

Abstract

Self-assessment is a key aspect of reliable intelligence, yet evaluations of large language models (LLMs) focus mainly on task accuracy. We adapted the 10-item General Self-Efficacy Scale (GSES) to elicit simulated self-assessments from ten LLMs across four conditions: no task, computational reasoning, social reasoning, and summarization. GSES responses were highly stable across repeated administrations and randomized item orders. However, models showed significantly different self-efficacy levels across conditions, with aggregate scores lower than human norms. All models achieved perfect accuracy on computational and social questions, whereas summarization performance varied widely. Self-assessment did not reliably reflect ability: several low-scoring models performed accurately, while some high-scoring models produced weaker summaries. Follow-up confidence prompts yielded modest,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Mental Health via Writing