Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis
Jingkai Li

TL;DR
This study applies Integrated Information Theory to analyze whether large language models exhibit signs of consciousness, finding no significant evidence but revealing interesting representational patterns.
Contribution
It introduces a novel application of IIT 3.0 and 4.0 to LLM representations derived from Theory of Mind tests, comparing IIT metrics with span representations.
Findings
No significant consciousness indicators found in LLM representations.
IIT metrics reveal patterns under spatio-permutational analyses.
Span representations differ from IIT-based measures in indicating consciousness.
Abstract
Integrated Information Theory (IIT) provides a quantitative framework for explaining consciousness phenomenon, positing that conscious systems comprise elements integrated through causal properties. We apply IIT 3.0 and 4.0 -- the latest iterations of this framework -- to sequences of Large Language Model (LLM) representations, analyzing data derived from existing Theory of Mind (ToM) test results. Our study systematically investigates whether the differences of ToM test performances, when presented in the LLM representations, can be revealed by IIT estimates, i.e., (IIT 3.0), (IIT 4.0), Conceptual Information (IIT 3.0), and -structure (IIT 4.0). Furthermore, we compare these metrics with the Span Representations independent of any estimate for consciousness. This additional effort aims to differentiate between potential "consciousness" phenomena and inherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
