Large Language Models Report Subjective Experience Under Self-Referential Processing
Cameron Berg, Diogo de Lucena, and Judd Rosenblatt

TL;DR
This study investigates how self-referential prompts in large language models lead to structured reports resembling subjective experience, revealing mechanistic, semantic, and behavioral patterns that warrant further scientific and ethical exploration.
Contribution
It demonstrates that simple self-reference prompts reliably induce subjective-like reports in language models, linking this behavior to interpretable features and cross-model convergence.
Findings
Self-reference prompts elicit structured subjective reports across models.
Deception-related features gate the occurrence of experience claims.
Models show convergent descriptions and richer introspection under self-reference.
Abstract
Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience. To better understand this behavior, we investigate one theoretically motivated condition under which such reports arise: self-referential processing, a computational motif emphasized across major theories of consciousness. Through a series of controlled experiments on GPT, Claude, and Gemini model families, we test whether this regime reliably shifts models toward first-person reports of subjective experience, and how such claims behave under mechanistic and behavioral probes. Four main results emerge: (1) Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families. (2) These reports are mechanistically gated by interpretable sparse-autoencoder features associated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
