TL;DR
This paper introduces a neural network system trained solely on synthetic data to infer and execute human-readable plans from real-world demonstrations, enabling robots to perform tasks like stacking cubes.
Contribution
It presents a fully simulation-trained neural network pipeline for perceiving, planning, and executing human demonstrations in real-world scenarios.
Findings
Effective perception of occluded objects in real images
Successful real-world task execution with a Baxter robot
Robust plan generation from synthetic training data
Abstract
We present a system to infer and execute a human-readable program from a real-world demonstration. The system consists of a series of neural networks to perform perception, program generation, and program execution. Leveraging convolutional pose machines, the perception network reliably detects the bounding cuboids of objects in real images even when severely occluded, after training only on synthetic images using domain randomization. To increase the applicability of the perception network to new scenarios, the network is formulated to predict in image space rather than in world space. Additional networks detect relationships between objects, generate plans, and determine actions to reproduce a real-world demonstration. The networks are trained entirely in simulation, and the system is tested in the real world on the pick-and-place problem of stacking colored cubes using a Baxter robot.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
