Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs
Ivan Chulo, Ananya Joshi

TL;DR
This paper investigates how emotional processing influences Theory of Mind abilities in large language models, revealing that emotional understanding, rather than analytical reasoning, mediates improved ToM performance after activation steering.
Contribution
It introduces a method to decompose ToM in LLMs by analyzing activation changes, highlighting emotional content as key to ToM improvements.
Findings
Activation steering improves ToM accuracy from 32.5% to 46.7%.
Emotional content processing increases significantly in steered models.
Analytical processes are suppressed during improved ToM performance.
Abstract
Recent work shows activation steering substantially improves language models' Theory of Mind (ToM) (Bortoletto et al. 2024), yet the mechanisms of what changes occur internally that leads to different outputs remains unclear. We propose decomposing ToM in LLMs by comparing steered versus baseline LLMs' activations using linear probes trained on 45 cognitive actions. We applied Contrastive Activation Addition (CAA) steering to Gemma-3-4B and evaluated it on 1,000 BigToM forward belief scenarios (Gandhi et al. 2023), we find improved performance on belief attribution tasks (32.5\% to 46.7\% accuracy) is mediated by activations processing emotional content : emotion perception (+2.23), emotion valuing (+2.20), while suppressing analytical processes: questioning (-0.78), convergent thinking (-1.59). This suggests that successful ToM abilities in LLMs are mediated by emotional understanding,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Child and Animal Learning Development · Language and cultural evolution
