Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain
Mohsen Jamali, Ziv M. Williams, Jing Cai

TL;DR
This study demonstrates that large language models exhibit Theory of Mind capabilities, with their internal embeddings responding to beliefs and perspectives similarly to neurons in the human brain, revealing an emergent cognitive property.
Contribution
The paper provides the first evidence of ToM-like responses in LLM embeddings, paralleling neural activity in the human dmPFC, and shows these responses depend on model size.
Findings
LLMs' embeddings respond to true- and false-belief trials
Embedding responses correlate with ToM task performance
Beliefs can be decoded from embeddings
Abstract
With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling · Child and Animal Learning Development
