Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction
Xianhao Wang, Xiaojian Ma, Haozhe Hu, Rongpeng Su, Yutian Cheng, Zhou Ziheng, Hangxin Liu, Lei Liu, Bin Li, and Qing Li

TL;DR
The paper introduces COIN, a comprehensive benchmark for evaluating interactive reasoning in embodied agents, including new tasks, datasets, and evaluation metrics, revealing current methods' limitations.
Contribution
It presents a new benchmark with tasks, datasets, and evaluation metrics for assessing interactive reasoning in embodied agents, addressing gaps in existing benchmarks.
Findings
Models struggle with interactive reasoning tasks due to gaps between visual understanding and motor execution.
COIN-50 includes 50 tasks in daily scenarios, with datasets and evaluation metrics.
Evaluation reveals critical limitations in current methods' ability to perform causal, interactive reasoning.
Abstract
Generalist embodied agents must perform interactive, causally-dependent reasoning, continually interacting with the environment, acquiring information, and updating plans to solve long-horizon tasks before they could be adopted in real-life scenarios. For instance, retrieving an apple from a cabinet may require opening multiple doors and drawers before the apple becomes visible and reachable, demanding sequential interaction under partial observability. However, existing benchmarks fail to systematically evaluate this essential capability. We introduce COIN, a benchmark designed to assess interactive reasoning in realistic robotic manipulation through three key contributions. First, we construct COIN-50: 50 interactive tasks in daily scenarios, and create COIN-Primitive required by causally-dependent tasks, and COIN-Composition with mid-term complexity for skill learning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
