Traces of Social Competence in Large Language Models
Tom Kouwenhoven, Michiel van der Meer, Max van Duijn

TL;DR
This paper evaluates the social reasoning capabilities of large language models using the False Belief Test, revealing how model size, training, and fine-tuning influence socio-cognitive responses.
Contribution
It introduces a comprehensive analysis of LLMs' Theory of Mind abilities, highlighting the effects of model scaling, instruction tuning, and vector steering on social reasoning.
Findings
Scaling model size improves FBT performance but not always.
Explicitly modeling propositional attitudes changes response patterns.
Vector steering can identify causal vectors influencing FBT behavior.
Abstract
The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al., 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented fine-tuning amplifies it. In a case study analysing social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Child and Animal Learning Development · Embodied and Extended Cognition
