LLMs as Signal Detectors: Sensitivity, Bias, and the Temperature-Criterion Analogy
Jon-Paul Cacioli

TL;DR
This study applies Signal Detection Theory to analyze large language models as signal detectors, revealing nuanced insights into their sensitivity and bias that are not captured by traditional calibration metrics.
Contribution
It introduces the use of full SDT parametric analysis to LLMs, demonstrating the breakdown of the temperature-criterion analogy and providing a more detailed diagnostic framework.
Findings
Temperature increases sensitivity and shifts criterion simultaneously.
Models show unequal-variance distributions with varying asymmetry.
SDT decomposition distinguishes models beyond calibration metrics.
Abstract
Large language models (LLMs) are evaluated for calibration using metrics such as Expected Calibration Error that conflate two distinct components: the model's ability to discriminate correct from incorrect answers (sensitivity) and its tendency toward confident or cautious responding (bias). Signal Detection Theory (SDT) decomposes these components. While SDT-derived metrics such as AUROC are increasingly used, the full parametric framework - unequal-variance model fitting, criterion estimation, z-ROC analysis - has not been applied to LLMs as signal detectors. In this pre-registered study, we treat three LLMs as observers performing factual discrimination across 168,000 trials and test whether temperature functions as a criterion shift analogous to payoff manipulations in human psychophysics. Critically, this analogy may break down because temperature changes the generated answer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Explainable Artificial Intelligence (XAI) · Psychometric Methodologies and Testing
