MobileManiBench: Simplifying Model Verification for Mobile Manipulation
Wenbo Wang, Fangyun Wei, QiXiu Li, Xi Chen, Yaobo Liang, Chang Xu, Jiaolong Yang, Baining Guo

TL;DR
MobileManiBench is a comprehensive simulation-based benchmark designed to evaluate and improve vision-language-action models for mobile robotic manipulation, addressing dataset limitations and enabling scalable, controlled studies.
Contribution
It introduces MobileManiBench, a large-scale, diverse simulation benchmark for mobile manipulation, facilitating systematic evaluation of VLA models before real-world deployment.
Findings
Benchmarking reveals insights into perception, reasoning, and control in simulated environments.
The framework accelerates research on data efficiency and model generalization.
MobileManiBench supports diverse robots, sensors, and tasks for comprehensive evaluation.
Abstract
Vision-language-action models have advanced robotic manipulation but remain constrained by reliance on the large, teleoperation-collected datasets dominated by the static, tabletop scenes. We propose a simulation-first framework to verify VLA architectures before real-world deployment and introduce MobileManiBench, a large-scale benchmark for mobile-based robotic manipulation. Built on NVIDIA Isaac Sim and powered by reinforcement learning, our pipeline autonomously generates diverse manipulation trajectories with rich annotations (language instructions, multi-view RGB-depth-segmentation images, synchronized object/robot states and actions). MobileManiBench features 2 mobile platforms (parallel-gripper and dexterous-hand robots), 2 synchronized cameras (head and right wrist), 630 objects in 20 categories, 5 skills (open, close, pull, push, pick) with over 100 tasks performed in 100…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
