ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Yu Sun; Meng Cao; Ping Yang; Rongtao Xu; Yunxiao Yan; Runze Xu; Liang Ma; Roy Gan; Andy Zhai; Qingxuan Chen; Zunnan Xu; Hao Wang; Jincheng Yu; Lucy Liang; Qian Wang; Ivan Laptev; Ian D Reid; Xiaodan Liang

arXiv:2603.28545·cs.RO·March 31, 2026

ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Yu Sun, Meng Cao, Ping Yang, Rongtao Xu, Yunxiao Yan, Runze Xu, Liang Ma, Roy Gan, Andy Zhai, Qingxuan Chen, Zunnan Xu, Hao Wang, Jincheng Yu, Lucy Liang, Qian Wang, Ivan Laptev, Ian D Reid, Xiaodan Liang

PDF

1 Datasets

TL;DR

ManipArena is a comprehensive evaluation framework that bridges simulation and real-world testing for reasoning-oriented robot manipulation, addressing current limitations in benchmarks and enabling fair comparison of models.

Contribution

It introduces a standardized, realistic, and scalable evaluation platform with diverse tasks, diagnostics, and real-to-sim environments for assessing VLA and world models in robotics.

Findings

01

Supports 20 diverse tasks with 10,812 expert trajectories.

02

Enables evaluation of reasoning, generalization, and long-horizon manipulation.

03

Provides rich sensory diagnostics and real-to-sim environments.

Abstract

Vision-Language-Action (VLA) models and world models have recently emerged as promising paradigms for general-purpose robotic intelligence, yet their progress is hindered by the lack of reliable evaluation protocols that reflect real-world deployment. Existing benchmarks are largely simulator-centric, which provide controllability but fail to capture the reality gap caused by perception noise, complex contact dynamics, hardware constraints, and system latency. Moreover, fragmented real-world evaluations across different robot platforms prevent fair and reproducible comparison. To address these challenges, we introduce ManipArena, a standardized evaluation framework designed to bridge simulation and real-world execution. ManipArena comprises 20 diverse tasks across 10,812 expert trajectories emphasizing reasoning-oriented manipulation tasks requiring semantic and spatial reasoning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ManipArena/maniparena-dataset
dataset· 16k dl
16k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.