Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

Yang Xiao; Jiashuo Wang; Qiancheng Xu; Changhe Song; Chunpu Xu; Yi Cheng; Wenjie Li; Pengfei Liu

arXiv:2505.17663·cs.CL·June 10, 2025

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

Yang Xiao, Jiashuo Wang, Qiancheng Xu, Changhe Song, Chunpu Xu, Yi Cheng, Wenjie Li, Pengfei Liu

PDF

1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces extsc{DynToM}, a new benchmark for evaluating LLMs' ability to understand and track the evolving mental states in social interactions, revealing significant performance gaps compared to humans.

Contribution

We develop extsc{DynToM}, a comprehensive benchmark with 1,100 social contexts and 78,100 questions to assess LLMs' dynamic Theory of Mind capabilities, highlighting current limitations.

Findings

01

LLMs underperform humans by 44.7% on average.

02

Performance drops significantly when reasoning about mental state shifts.

03

Current LLMs struggle with modeling the temporal evolution of mental states.

Abstract

As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world social interactions. We present \textsc{DynToM}, a novel benchmark specifically designed to evaluate LLMs' ability to understand and track the temporal progression of mental states across interconnected scenarios. Through a systematic four-step framework, we generate 1,100 social contexts encompassing 5,500 scenarios and 78,100 questions, each validated for realism and quality. Our comprehensive evaluation of ten state-of-the-art LLMs reveals that their average performance underperforms humans by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GAIR-NLP/DynToM
noneOfficial

Datasets

YangXiao-nlp/DynToM
dataset· 166 dl
166 dl

Videos

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States· underline

Taxonomy

MethodsFocus