Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks
Hongtao Wu, Jikai Ye, Xin Meng, Chris Paxton, Gregory Chirikjian

TL;DR
This paper introduces Transporters with Visual Foresight, a model enabling robotic systems to learn and generalize unseen rearrangement tasks efficiently using visual foresight and multi-modal action proposals, significantly improving success rates.
Contribution
The paper presents a novel visual foresight model combined with a multi-modal action proposal module, enabling zero-shot generalization to unseen rearrangement tasks with minimal data.
Findings
Success rate on unseen tasks improved from 55.4% to 78.5% in simulation.
Success rate on real robots increased from 30% to 63.3%.
Model learns effectively from only tens of demonstrations.
Abstract
Rearrangement tasks have been identified as a crucial challenge for intelligent robotic manipulation, but few methods allow for precise construction of unseen structures. We propose a visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently. In addition, we develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method. Our image-based task planning method, Transporters with Visual Foresight, is able to learn from only a handful of data and generalize to multiple unseen tasks in a zero-shot manner. TVF is able to improve the performance of a state-of-the-art imitation learning method on unseen tasks in simulation and real robot experiments. In particular, the average success rate on unseen tasks improves from 55.4% to 78.5% in simulation experiments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
