Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration

Fardin Saad; Pradeep K. Murukannaiah; Munindar P. Singh

arXiv:2507.02935·cs.CL·April 20, 2026

Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration

Fardin Saad, Pradeep K. Murukannaiah, Munindar P. Singh

PDF

TL;DR

This paper introduces a novel instruction inference task to evaluate Theory of Mind in human-agent collaboration, demonstrating that LLM-based agents can achieve human-like understanding in dynamic, goal-oriented tasks.

Contribution

The paper presents Tomcat, an LLM-based agent with ToM reasoning capabilities, and evaluates its performance in a new collaborative task with human participants.

Findings

01

Tomcat with Fs-CoT achieves performance comparable to humans.

02

GPT-4o and DeepSeek-R1 variants perform best among tested models.

03

The study highlights the potential of LLMs for effective human-agent collaboration.

Abstract

Successful human-agent teaming relies on an agent being able to understand instructions given by a (human) principal. In many cases, an instruction may be incomplete or ambiguous. In such cases, the agent must infer the unspoken intentions from their shared context, that is, it must exercise the principal's Theory of Mind (ToM) and infer the mental states of its principal. We consider the prospects of effective human-agent collaboration using large language models (LLMs). To assess ToM in a dynamic, goal-oriented, and collaborative environment, we introduce a novel task, Instruction Inference, in which an agent assists a principal in reaching a goal by interpreting incomplete or ambiguous instructions. We present Tomcat, an LLM-based agent, designed to exhibit ToM reasoning in interpreting and responding to the principal's instructions. We implemented two variants of Tomcat. One, dubbed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.