TL;DR
This paper introduces a self-supervised dense visual correspondence method that significantly enhances the generalization and sample efficiency of visuomotor policies, enabling robust real-world manipulation with minimal demonstrations.
Contribution
The paper proposes a novel self-supervised correspondence training approach that improves visuomotor policy generalization and reduces data requirements compared to prior autoencoding and end-to-end methods.
Findings
High generalization to object classes and configurations
Effective manipulation of textureless and symmetrical objects
Successful real-world validation with as few as 50 demonstrations
Abstract
In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense visual correspondence training, and show this enables visuomotor policy learning with surprisingly high generalization performance with modest amounts of data: using imitation learning, we demonstrate extensive hardware validation on challenging manipulation tasks with as few as 50 demonstrations. Our learned policies can generalize across classes of objects, react to deformable object configurations, and manipulate textureless symmetrical objects in a variety of backgrounds, all with closed-loop,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods3D Convolution
