HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning   in Large Language Models

Yinghui He; Yufan Wu; Yilin Jia; Rada Mihalcea; Yulong Chen; Naihao; Deng

arXiv:2310.16755·cs.CL·October 26, 2023·1 cites

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, Naihao, Deng

PDF

Open Access 1 Repo

TL;DR

This paper introduces HI-TOM, a benchmark for higher-order Theory of Mind reasoning in large language models, revealing current models' limitations in recursive mental state reasoning.

Contribution

The paper presents HI-TOM, the first benchmark specifically designed to evaluate higher-order ToM in LLMs, and provides an analysis of their performance shortcomings.

Findings

01

LLMs show decreased performance on higher-order ToM tasks

02

Current LLMs struggle with recursive reasoning about beliefs

03

The study highlights the need for improved models for complex social reasoning

Abstract

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ying-hui-he/hi-tom_dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques