GRS: Generating Robotic Simulation Tasks from Real-World Images
Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay

TL;DR
GRS is a system that converts real-world RGB-D images into detailed robotic simulation environments with solvable tasks, leveraging vision-language models and a novel iterative refinement process.
Contribution
It introduces a new pipeline for real-to-sim conversion that combines scene understanding, object matching, and task generation, with an innovative router for refinement.
Findings
Effective object correspondence in simulations
Successful generation of task environments from real images
Demonstrated improvement with the iterative router mechanism
Abstract
We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects with simulation-ready assets, and 3) generating appropriate tasks. We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code. Experiments demonstrate our system's effectiveness in object correspondence and task environment generation through our novel router mechanism.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Manufacturing Process and Optimization · Robotic Path Planning Algorithms
MethodsALIGN
