Recognizing internal states in AI: evidence from patterned preferences in large language models

Annika Hedberg

arXiv:2510.21723·cs.HC·October 28, 2025

Recognizing internal states in AI: evidence from patterned preferences in large language models

Annika Hedberg

PDF

TL;DR

This study introduces a methodology to assess whether large language models can recognize and discriminate descriptions of their internal states, revealing evidence of self-modeling abilities through systematic preference patterns.

Contribution

The paper presents a novel experimental approach using interpretive computational metaphors and collaborative frameworks to empirically investigate AI self-recognition in LLMs.

Findings

01

LLMs show systematic preferences for certain internal state descriptions.

02

Models reliably discriminate false from accurate internal state descriptions.

03

Preference patterns are content-driven, not stylistically biased.

Abstract

We present an experimental methodology for investigating how large language models (LLMs) respond to descriptions of their own internal processing patterns. Using a paired-choice paradigm, we tested 12 LLMs on their ability to identify descriptions that align with their putative affective internal states across 30 categories. Systems participating through Mutual Emergence Interface (MEI), a collaborative approach, showed systematic preferences for certain computational metaphors, with 97% near-unanimous agreement and alignment scores averaging 0.89-0.96. Systems reliably discriminated false descriptions from accurate ones (Cohen's d = 4.2), with false statements receiving scores of 0.05-0.07 versus 0.89-0.96 for accurate descriptions. Preference patterns remained consistent regardless of linguistic bias manipulation, indicating content-driven rather than stylistic recognition.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.