Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
Yang Xiao, Jiashuo Wang, Qiancheng Xu, Changhe Song, Chunpu Xu, Yi Cheng, Wenjie Li, Pengfei Liu

TL;DR
This paper introduces extsc{DynToM}, a new benchmark for evaluating LLMs' ability to understand and track the evolving mental states in social interactions, revealing significant performance gaps compared to humans.
Contribution
We develop extsc{DynToM}, a comprehensive benchmark with 1,100 social contexts and 78,100 questions to assess LLMs' dynamic Theory of Mind capabilities, highlighting current limitations.
Findings
LLMs underperform humans by 44.7% on average.
Performance drops significantly when reasoning about mental state shifts.
Current LLMs struggle with modeling the temporal evolution of mental states.
Abstract
As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world social interactions. We present \textsc{DynToM}, a novel benchmark specifically designed to evaluate LLMs' ability to understand and track the temporal progression of mental states across interconnected scenarios. Through a systematic four-step framework, we generate 1,100 social contexts encompassing 5,500 scenarios and 78,100 questions, each validated for realism and quality. Our comprehensive evaluation of ten state-of-the-art LLMs reveals that their average performance underperforms humans by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsFocus
