Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction

Xianhao Wang; Xiaojian Ma; Haozhe Hu; Rongpeng Su; Yutian Cheng; Zhou Ziheng; Hangxin Liu; Lei Liu; Bin Li; and Qing Li

arXiv:2604.16886·cs.RO·April 21, 2026

Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction

Xianhao Wang, Xiaojian Ma, Haozhe Hu, Rongpeng Su, Yutian Cheng, Zhou Ziheng, Hangxin Liu, Lei Liu, Bin Li, and Qing Li

PDF

TL;DR

The paper introduces COIN, a comprehensive benchmark for evaluating interactive reasoning in embodied agents, including new tasks, datasets, and evaluation metrics, revealing current methods' limitations.

Contribution

It presents a new benchmark with tasks, datasets, and evaluation metrics for assessing interactive reasoning in embodied agents, addressing gaps in existing benchmarks.

Findings

01

Models struggle with interactive reasoning tasks due to gaps between visual understanding and motor execution.

02

COIN-50 includes 50 tasks in daily scenarios, with datasets and evaluation metrics.

03

Evaluation reveals critical limitations in current methods' ability to perform causal, interactive reasoning.

Abstract

Generalist embodied agents must perform interactive, causally-dependent reasoning, continually interacting with the environment, acquiring information, and updating plans to solve long-horizon tasks before they could be adopted in real-life scenarios. For instance, retrieving an apple from a cabinet may require opening multiple doors and drawers before the apple becomes visible and reachable, demanding sequential interaction under partial observability. However, existing benchmarks fail to systematically evaluate this essential capability. We introduce COIN, a benchmark designed to assess interactive reasoning in realistic robotic manipulation through three key contributions. First, we construct COIN-50: 50 interactive tasks in daily scenarios, and create COIN-Primitive required by causally-dependent tasks, and COIN-Composition with mid-term complexity for skill learning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.