# LLMs achieve adult human performance on higher-order theory of mind tasks

**Authors:** Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Agüera y Arcas, Robin I. M. Dunbar

PMC · DOI: 10.3389/fnhum.2025.1633272 · Frontiers in Human Neuroscience · 2026-01-02

## TL;DR

This paper shows that large language models like GPT-4 perform as well as or better than adults on complex tasks involving understanding others' thoughts and beliefs.

## Contribution

The study introduces a new test for higher-order theory of mind and shows that GPT-4 outperforms adults on some tasks.

## Key findings

- GPT-4 and Flan-PaLM achieve adult-level performance on higher-order theory of mind tasks.
- GPT-4 exceeds adult performance on 6th order theory of mind inferences.
- Model size and fine-tuning influence higher-order theory of mind performance in LLMs.

## Abstract

This paper examines the extent to which large language models (LLMs) are able to perform tasks which require higher-order theory of mind (ToM)—the human ability to reason about multiple mental and emotional states in a recursive manner (e.g., I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite—Multi-Order Theory of Mind Q&A—and using it to compare the performance of five LLMs of varying sizes and training paradigms to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on our ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for higher-order ToM performance, and that the linguistic abilities of large models may support more complex ToM inferences. Given the important role that higher-order ToM plays in group social interaction and relationships, these findings have significant implications for the development of a broad range of social, educational and assistive LLM applications.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12808479/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12808479/full.md

## References

85 references — full list in the complete paper: https://tomesphere.com/paper/PMC12808479/full.md

---
Source: https://tomesphere.com/paper/PMC12808479