TL;DR
RoboPlayground introduces a framework for evaluating robotic manipulation through natural language-defined, structured physical domains, enabling flexible, user-authored task variations and revealing generalization challenges.
Contribution
It presents a novel language-driven evaluation framework that allows users to author and extend manipulation tasks, improving flexibility and inclusivity in robotic assessment.
Findings
User study shows lower cognitive workload with RoboPlayground interface.
Language-defined task families reveal generalization failures in policies.
Task diversity increases with contributor diversity, not just task count.
Abstract
Evaluation of robotic manipulation systems has largely relied on fixed benchmarks authored by a small number of experts, where task instances, constraints, and success criteria are predefined and difficult to extend. This paradigm limits who can shape evaluation and obscures how policies respond to user-authored variations in task intent, constraints, and notions of success. We argue that evaluating modern manipulation policies requires reframing evaluation as a language-driven process over structured physical domains. We present RoboPlayground, a framework that enables users to author executable manipulation tasks using natural language within a structured physical domain. Natural language instructions are compiled into reproducible task specifications with explicit asset definitions, initialization distributions, and success predicates. Each instruction defines a structured family of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
