RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Xuning Yang; Rishit Dagli; Alex Zook; Hugo Hadfield; Ankit Goyal; Stan Birchfield; Fabio Ramos; Jonathan Tremblay

arXiv:2604.09860·cs.RO·May 15, 2026

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

Xuning Yang, Rishit Dagli, Alex Zook, Hugo Hadfield, Ankit Goyal, Stan Birchfield, Fabio Ramos, Jonathan Tremblay

PDF

2 Repos

TL;DR

RoboLab is a high-fidelity simulation benchmark designed to evaluate the true generalization of task-generalist robotic policies through diverse tasks and systematic analysis of policy robustness.

Contribution

It introduces RoboLab, a scalable simulation framework with 120 tasks and a systematic analysis method to assess policy performance and robustness.

Findings

01

Current state-of-the-art models show significant performance gaps in RoboLab.

02

RoboLab enables analysis of policy sensitivity to controlled perturbations.

03

The benchmark provides granular metrics for evaluating generalization.

Abstract

The pursuit of general-purpose robotics has yielded impressive foundation models, yet simulation-based benchmarking remains a bottleneck due to rapid performance saturation and a lack of true generalization testing. Existing benchmarks often exhibit significant domain overlap between training and evaluation, trivializing success rates and obscuring insights into robustness. We introduce RoboLab, a simulation benchmarking framework designed to address these challenges. Concretely, our framework is designed to answer two questions: (1) to what extent can we understand the performance of a real-world policy by analyzing its behavior in simulation, and (2) which factor most strongly affect policy behavior. First, RoboLab enables human-authored and LLM-enabled generation of scenes and tasks in a robot- and policy-agnostic manner within a high-fidelity simulation environment. We introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.