The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents

Ziyu Wang; Chenyuan Liu; Yushun Xiang; Runhao Zhang; Qingbo Hao; Hongliang Lu; Houyu Chen; Zhizhong Feng; Kaiyue Zheng; Dehao Ye; Xianchao Zeng; Xinyu Zhou; Boran Wen; Jiaxin Li; Mingyu Zhang; Kecheng Zheng; Qian Zhu; Ran Cheng; Yong-Lu Li

arXiv:2601.11421·cs.RO·January 19, 2026

The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents

Ziyu Wang, Chenyuan Liu, Yushun Xiang, Runhao Zhang, Qingbo Hao, Hongliang Lu, Houyu Chen, Zhizhong Feng, Kaiyue Zheng, Dehao Ye, Xianchao Zeng, Xinyu Zhou, Boran Wen, Jiaxin Li, Mingyu Zhang, Kecheng Zheng, Qian Zhu, Ran Cheng, Yong-Lu Li

PDF

Open Access 2 Datasets

TL;DR

The paper introduces GM-100, a comprehensive set of 100 diverse and challenging tasks designed to systematically evaluate and advance the capabilities of embodied AI agents in robotics.

Contribution

It presents GM-100 as a systematic, diverse benchmark for evaluating robotic agents, addressing limitations of previous datasets and task designs.

Findings

01

GM-100 tasks are feasible and challenging.

02

Baseline models show varied performance across tasks.

03

The dataset promotes comprehensive evaluation of embodied AI.

Abstract

Recently, with the rapid development of robot learning and imitation learning, numerous datasets and methods have emerged. However, these datasets and their task designs often lack systematic consideration and principles. This raises important questions: Do the current datasets and task designs truly advance the capabilities of robotic agents? Do evaluations on a few common tasks accurately reflect the differentiated performance of various methods proposed by different teams and evaluated on different tasks? To address these issues, we introduce the Great March 100 (\textbf{GM-100}) as the first step towards a robot learning Olympics. GM-100 consists of 100 carefully designed tasks that cover a wide range of interactions and long-tail behaviors, aiming to provide a diverse and challenging set of tasks to comprehensively evaluate the capabilities of robotic agents and promote diversity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI