RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

Mengfan Li; Xuanhua Shi; Yang Deng

arXiv:2511.22275·cs.AI·December 1, 2025

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems

Mengfan Li, Xuanhua Shi, Yang Deng

PDF

Open Access 1 Video

TL;DR

RecToM introduces a comprehensive benchmark for evaluating machine Theory of Mind in conversational recommender systems, emphasizing mental state inference and strategic dialogue planning in realistic settings.

Contribution

This paper presents RecToM, a novel benchmark that assesses both cognitive inference and behavioral prediction of LLMs in recommendation dialogues, addressing limitations of existing synthetic and perception-focused tests.

Findings

01

LLMs show partial understanding of mental states.

02

Models struggle with maintaining coherent ToM reasoning.

03

Difficulty in aligning dialogue strategies with inferred mental states.

Abstract

Large Language models are revolutionizing the conversational recommender systems through their impressive capabilities in instruction comprehension, reasoning, and human interaction. A core factor underlying effective recommendation dialogue is the ability to infer and reason about users' mental states (such as desire, intention, and belief), a cognitive capacity commonly referred to as Theory of Mind. Despite growing interest in evaluating ToM in LLMs, current benchmarks predominantly rely on synthetic narratives inspired by Sally-Anne test, which emphasize physical perception and fail to capture the complexity of mental state inference in realistic conversational settings. Moreover, existing benchmarks often overlook a critical component of human ToM: behavioral prediction, the ability to use inferred mental states to guide strategic decision-making and select appropriate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Recommender Systems and Techniques