DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

Neemesh Yadav; Palakorn Achananuparp; Jing Jiang; Ee-Peng Lim

arXiv:2604.20443·cs.CL·April 23, 2026

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

Neemesh Yadav, Palakorn Achananuparp, Jing Jiang, Ee-Peng Lim

PDF

1 Repo

TL;DR

DialToM is a benchmark to evaluate Large Language Models' ability to predict dialogue trajectories based on mental states, revealing strengths in mental state identification but weaknesses in forecasting social outcomes.

Contribution

Introduces DialToM, a novel benchmark for assessing both mental state prediction and trajectory forecasting in dialogue, highlighting reasoning gaps in current LLMs.

Findings

01

LLMs excel at identifying mental states but struggle with trajectory forecasting.

02

Most LLMs, except Gemini 3 Pro, fail to leverage mental states for social predictions.

03

Weak semantic alignment between human and LLM inferences.

Abstract

Large Language Models (LLMs) have been shown to possess Theory of Mind (ToM) abilities. However, it remains unclear whether this stems from robust reasoning or spurious correlations. We introduce DialToM, a human-verified benchmark built from natural human dialogue using a multiple-choice framework. We evaluate not only mental state prediction (Literal ToM) but also the functional utility of these states (Functional ToM) through Prospective Diagnostic Forecasting -- probing whether models can identify state-consistent dialogue trajectories solely from mental-state profiles. Our results reveal a significant reasoning asymmetry: while LLMs excel at identifying mental states, most (except for Gemini 3 Pro) fail to leverage this understanding to forecast social trajectories. Additionally, we find only weak semantic similarities between human and LLM-generated inferences. To facilitate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Stealth-py/DialToM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.