REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation
Martin Sedlacek, Pavlo Yefanov, Georgy Ponimatkin, Jai Bardhan, Simon Pilc, Mederic Fourmy, Evangelos Kazakos, Cees G. M. Snoek, Josef Sivic, Vladimir Petrik

TL;DR
REALM introduces a high-fidelity simulation benchmark to evaluate and improve the generalization of vision-language-action models in robotic manipulation, bridging the gap between simulation and real-world performance.
Contribution
The paper presents a new simulation environment and benchmark, REALM, with diverse perturbations and tasks, to systematically assess and enhance VLA models' generalization capabilities.
Findings
Simulation correlates well with real-world performance.
Current models show significant robustness and generalization gaps.
Benchmark reveals specific failure modes of VLA models.
Abstract
Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the \pi_{0}, \pi_{0}-FAST, and GR00T N1.5 VLA models, showing that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
