Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation
Jan Ole von Hartz, Eugenio Chisari, Tim Welschehold, Abhinav Valada

TL;DR
This paper introduces a method for learning multi-object keypoints from raw images using dense correspondence, improving sample efficiency and robustness for robotic manipulation tasks.
Contribution
It extends prior keypoint learning methods to multi-object scenes, addressing scale-invariance and occlusion, and demonstrates improved policy learning from raw camera data.
Findings
Effective keypoint learning in multi-object scenes
Enhanced robustness to scale and occlusion
Sample-efficient policy learning demonstrated
Abstract
In recent years, policy learning methods using either reinforcement or imitation have made significant progress. However, both techniques still suffer from being computationally expensive and requiring large amounts of training data. This problem is especially prevalent in real-world robotic manipulation tasks, where access to ground truth scene features is not available and policies are instead learned from raw camera observations. In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning. Extending prior work to challenging multi-object scenes, we show that our model can be trained to deal with important problems in representation learning, primarily scale-invariance and occlusion. We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
