Measuring the metacognition of AI
Richard Servajean, Philippe Servajean

TL;DR
This paper advocates for using the meta-d' framework and signal detection theory to measure AI metacognition, demonstrating their application on large language models to assess confidence and decision regulation.
Contribution
It introduces the meta-d' framework as a standard for measuring AI metacognitive sensitivity and applies psychophysical methods to evaluate LLMs' decision confidence and risk regulation.
Findings
Meta-d' enables comparison of LLMs' metacognitive sensitivity.
SDT reveals LLMs' increased conservatism under high risk.
Experiments on GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508 demonstrate practical utility.
Abstract
A robust decision-making process must take into account uncertainty, especially when the choice involves inherent risks. Because artificial Intelligence (AI) systems are increasingly integrated into decision-making workflows, managing uncertainty relies more and more on the metacognitive capabilities of these systems; i.e, their ability to assess the reliability of and regulate their own decisions. Hence, it is crucial to employ robust methods to measure the metacognitive abilities of AI. This paper is primarily a methodological contribution arguing for the adoption of the meta-d' framework as the gold standard for assessing the metacognitive sensitivity of AIs--the ability to generate confidence ratings that distinguish correct from incorrect responses. Moreover, we propose to leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
