The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring

Jon-Paul Cacioli

arXiv:2604.15702·cs.CL·April 22, 2026

The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring

Jon-Paul Cacioli

PDF

1 Repo 1 Datasets

TL;DR

This paper presents a comprehensive benchmark for evaluating large language models' self-monitoring abilities across multiple cognitive domains, using a psychometric framework and behavioral assays.

Contribution

It introduces a novel cross-domain metacognitive assessment battery grounded in established psychological paradigms, applied to 20 LLMs with publicly available data and code.

Findings

01

Discriminates three LLM profiles: confidence, withdrawal, sensitivity.

02

Reveals inverse relationship between accuracy and metacognitive sensitivity.

03

Shows architecture-dependent differences in metacognitive calibration scaling.

Abstract

We introduce a cross-domain behavioural assay of monitoring-control coupling in LLMs, grounded in the Nelson and Narens (1990) metacognitive framework and applying human psychometric methodology to LLM evaluation. The battery comprises 524 items across six cognitive domains (learning, metacognitive calibration, social cognition, attention, executive function, prospective regulation), each grounded in an established experimental paradigm. Tasks T1-T5 were pre-registered on OSF prior to data collection; T6 was added as an exploratory extension. After every forced-choice response, dual probes adapted from Koriat and Goldsmith (1996) ask the model to KEEP or WITHDRAW its answer and to BET or decline. The critical metric is the withdraw delta: the difference in withdrawal rate between incorrect and correct items. Applied to 20 frontier LLMs (10,480 evaluations), the battery discriminates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

synthiumjp/metacognitive-monitoring-battery
github

Datasets

synthiumjp/metacognitive-monitoring-battery
dataset· 55 dl
55 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.