Privileged Self-Access Matters for Introspection in AI

Siyuan Song; Harvey Lederman; Jennifer Hu; Kyle Mahowald

arXiv:2508.14802·cs.AI·August 21, 2025

Privileged Self-Access Matters for Introspection in AI

Siyuan Song, Harvey Lederman, Jennifer Hu, Kyle Mahowald

PDF

Open Access

TL;DR

This paper explores the concept of introspection in AI, proposing a more rigorous definition and demonstrating that current models may not genuinely meet this standard despite superficial indications.

Contribution

It introduces a new, thicker definition of AI introspection and evaluates whether large language models truly possess it, challenging prior assumptions.

Findings

01

LLMs can appear to have lightweight introspection

02

Models often fail to meet the proposed rigorous introspection criteria

03

Introspection in AI requires more reliable internal state information than previously assumed

Abstract

Whether AI models can introspect is an increasingly important practical question. But there is no consensus on how introspection is to be defined. Beginning from a recently proposed ''lightweight'' definition, we argue instead for a thicker one. According to our proposal, introspection in AI is any process which yields information about internal states through a process more reliable than one with equal or lower computational cost available to a third party. Using experiments where LLMs reason about their internal temperature parameters, we show they can appear to have lightweight introspection while failing to meaningfully introspect per our proposed definition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)