Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective
Xuning Yang, Clemens Eppner, Jonathan Tremblay, Dieter Fox, Stan Birchfield, Fabio Ramos

TL;DR
This paper discusses the challenges in benchmarking generalist robotic manipulation policies for sim-to-real transfer, emphasizing high-fidelity simulation, robustness evaluation, and performance alignment between simulation and real-world scenarios.
Contribution
It proposes a comprehensive framework for benchmarking robotic policies that includes high-fidelity simulation, robustness testing, and performance correlation measures.
Findings
High visual-fidelity simulation improves sim-to-real transfer.
Systematic task complexity increases evaluate robustness.
Quantifying performance alignment aids in transfer assessment.
Abstract
Current vision-based robotics simulation benchmarks have significantly advanced robotic manipulation research. However, robotics is fundamentally a real-world problem, and evaluation for real-world applications has lagged behind in evaluating generalist policies. In this paper, we discuss challenges and desiderata in designing benchmarks for generalist robotic manipulation policies for the goal of sim-to-real policy transfer. We propose 1) utilizing high visual-fidelity simulation for improved sim-to-real transfer, 2) evaluating policies by systematically increasing task complexity and scenario perturbation to assess robustness, and 3) quantifying performance alignment between real-world performance and its simulation counterparts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
