Transporter Networks: Rearranging the Visual World for Robotic   Manipulation

Andy Zeng; Pete Florence; Jonathan Tompson; Stefan Welker; Jonathan; Chien; Maria Attarian; Travis Armstrong; Ivan Krasin; Dan Duong; Ayzaan; Wahid; Vikas Sindhwani; Johnny Lee

arXiv:2010.14406·cs.RO·January 7, 2022·41 cites

Transporter Networks: Rearranging the Visual World for Robotic Manipulation

Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan, Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Ayzaan, Wahid, Vikas Sindhwani, Johnny Lee

PDF

Open Access 3 Repos

TL;DR

Transporter Networks are a novel architecture for robotic manipulation that rearranges visual features to infer spatial displacements, enabling efficient learning and generalization across diverse manipulation tasks without relying on object assumptions.

Contribution

The paper introduces the Transporter Network, a new model architecture that efficiently learns visual manipulation policies by rearranging deep features, outperforming existing methods in sample efficiency and generalization.

Findings

01

Learns faster than end-to-end baselines.

02

Generalizes to multi-step and 6DoF tasks.

03

Effective in real-world robotic experiments.

Abstract

Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions. It makes no assumptions of objectness (e.g. canonical poses, models, or keypoints), it exploits spatial symmetries, and is orders of magnitude more sample efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks, to assembling kits with unseen objects; from manipulating deformable ropes, to pushing piles of small objects with closed-loop feedback. Our method can represent complex multi-modal policy distributions and generalizes to multi-step sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Advanced Vision and Imaging