Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective

Xuning Yang; Clemens Eppner; Jonathan Tremblay; Dieter Fox; Stan Birchfield; Fabio Ramos

arXiv:2508.11117·cs.RO·August 18, 2025

Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective

Xuning Yang, Clemens Eppner, Jonathan Tremblay, Dieter Fox, Stan Birchfield, Fabio Ramos

PDF

TL;DR

This paper discusses the challenges in benchmarking generalist robotic manipulation policies for sim-to-real transfer, emphasizing high-fidelity simulation, robustness evaluation, and performance alignment between simulation and real-world scenarios.

Contribution

It proposes a comprehensive framework for benchmarking robotic policies that includes high-fidelity simulation, robustness testing, and performance correlation measures.

Findings

01

High visual-fidelity simulation improves sim-to-real transfer.

02

Systematic task complexity increases evaluate robustness.

03

Quantifying performance alignment aids in transfer assessment.

Abstract

Current vision-based robotics simulation benchmarks have significantly advanced robotic manipulation research. However, robotics is fundamentally a real-world problem, and evaluation for real-world applications has lagged behind in evaluating generalist policies. In this paper, we discuss challenges and desiderata in designing benchmarks for generalist robotic manipulation policies for the goal of sim-to-real policy transfer. We propose 1) utilizing high visual-fidelity simulation for improved sim-to-real transfer, 2) evaluating policies by systematically increasing task complexity and scenario perturbation to assess robustness, and 3) quantifying performance alignment between real-world performance and its simulation counterparts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.