Learning to See before Learning to Act: Visual Pre-training for Manipulation
Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

TL;DR
Pre-training visual models significantly enhances robotic manipulation capabilities, enabling zero-shot object pickup and improving sample efficiency through transfer learning of affordance-related features.
Contribution
The paper demonstrates that transferring features from vision models to manipulation tasks enables zero-shot learning and improves sample efficiency in robotic object manipulation.
Findings
Zero-shot object pickup with no robotic experience.
80% success rate after minimal robotic training.
Transfer of visual features enhances manipulation performance.
Abstract
Does having visual priors (e.g. the ability to detect objects) facilitate learning to perform vision-based manipulation (e.g. picking up objects)? We study this problem under the framework of transfer learning, where the model is first trained on a passive vision task, and adapted to perform an active manipulation task. We find that pre-training on vision tasks significantly improves generalization and sample efficiency for learning to manipulate objects. However, realizing these gains requires careful selection of which parts of the model to transfer. Our key insight is that outputs of standard vision models highly correlate with affordance maps commonly used in manipulation. Therefore, we explore directly transferring model parameters from vision networks to affordance prediction networks, and show that this can result in successful zero-shot adaptation, where a robot can pick up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Domain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications
