REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation

Martin Sedlacek; Pavlo Yefanov; Georgy Ponimatkin; Jai Bardhan; Simon Pilc; Mederic Fourmy; Evangelos Kazakos; Cees G. M. Snoek; Josef Sivic; Vladimir Petrik

arXiv:2512.19562·cs.RO·December 23, 2025

REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation

Martin Sedlacek, Pavlo Yefanov, Georgy Ponimatkin, Jai Bardhan, Simon Pilc, Mederic Fourmy, Evangelos Kazakos, Cees G. M. Snoek, Josef Sivic, Vladimir Petrik

PDF

Open Access

TL;DR

REALM introduces a high-fidelity simulation benchmark to evaluate and improve the generalization of vision-language-action models in robotic manipulation, bridging the gap between simulation and real-world performance.

Contribution

The paper presents a new simulation environment and benchmark, REALM, with diverse perturbations and tasks, to systematically assess and enhance VLA models' generalization capabilities.

Findings

01

Simulation correlates well with real-world performance.

02

Current models show significant robustness and generalization gaps.

03

Benchmark reveals specific failure modes of VLA models.

Abstract

Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the \pi_{0}, \pi_{0}-FAST, and GR00T N1.5 VLA models, showing that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning