URDFormer: A Pipeline for Constructing Articulated Simulation   Environments from Real-World Images

Zoey Chen; Aaron Walsman; Marius Memmel; Kaichun Mo; Alex Fang,; Karthikeya Vemuri; Alan Wu; Dieter Fox; Abhishek Gupta

arXiv:2405.11656·cs.RO·June 3, 2024

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang,, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta

PDF

Open Access

TL;DR

URDFormer introduces a scalable pipeline that infers and generates realistic articulated simulation environments from real-world images, enabling large-scale data-driven robotic training.

Contribution

The paper presents a novel end-to-end pipeline that synthesizes articulated simulation scenes from images using controllable text-to-image models, facilitating scalable data generation for robotics.

Findings

01

Generated large datasets of realistic scenes with semantic and physical properties.

02

Successfully trained robotic control policies in simulation and deployed them in real-world tasks.

03

Demonstrated the pipeline's effectiveness in creating diverse, articulated environments from web-scale datasets.

Abstract

Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by hand. A graphic designer and a simulation engineer work with predefined assets to construct rich scenes with realistic dynamic and kinematic properties. While this may scale to small numbers of scenes, to achieve the generalization properties that are required for data-driven robotic control, we require a pipeline that is able to synthesize large numbers of realistic scenes, complete with 'natural' kinematic and dynamic structures. To attack this problem, we develop models for inferring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis