RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

Yi Ru Wang; Carter Ung; Evan Gubarev; Christopher Tan; Siddhartha Srinivasa; Dieter Fox

arXiv:2604.05226·cs.RO·April 8, 2026

RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

Yi Ru Wang, Carter Ung, Evan Gubarev, Christopher Tan, Siddhartha Srinivasa, Dieter Fox

PDF

1 Repo

TL;DR

RoboPlayground introduces a framework for evaluating robotic manipulation through natural language-defined, structured physical domains, enabling flexible, user-authored task variations and revealing generalization challenges.

Contribution

It presents a novel language-driven evaluation framework that allows users to author and extend manipulation tasks, improving flexibility and inclusivity in robotic assessment.

Findings

01

User study shows lower cognitive workload with RoboPlayground interface.

02

Language-defined task families reveal generalization failures in policies.

03

Task diversity increases with contributor diversity, not just task count.

Abstract

Evaluation of robotic manipulation systems has largely relied on fixed benchmarks authored by a small number of experts, where task instances, constraints, and success criteria are predefined and difficult to extend. This paradigm limits who can shape evaluation and obscures how policies respond to user-authored variations in task intent, constraints, and notions of success. We argue that evaluating modern manipulation policies requires reframing evaluation as a language-driven process over structured physical domains. We present RoboPlayground, a framework that enables users to author executable manipulation tasks using natural language within a structured physical domain. Natural language instructions are compiled into reproducible task specifications with explicit asset definitions, initialization distributions, and success predicates. Each instruction defines a structured family of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://roboplayground.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.