OSSID: Online Self-Supervised Instance Detection by (and for) Pose Estimation
Qiao Gu, Brian Okorn, David Held

TL;DR
The paper introduces OSSID, a self-supervised framework that trains a fast object detector using a slow pose estimator's pseudo-labels, enabling real-time object pose estimation without human annotations.
Contribution
It presents a novel self-supervised online learning method that improves detection and pose estimation speed and accuracy without human annotations.
Findings
Outperforms existing zero-shot detection methods on key datasets.
Achieves real-time pose estimation speeds.
Eliminates need for human annotations during training.
Abstract
Real-time object pose estimation is necessary for many robot manipulation algorithms. However, state-of-the-art methods for object pose estimation are trained for a specific set of objects; these methods thus need to be retrained to estimate the pose of each new object, often requiring tens of GPU-days of training for optimal performance. In this paper, we propose the OSSID framework, leveraging a slow zero-shot pose estimator to self-supervise the training of a fast detection algorithm. This fast detector can then be used to filter the input to the pose estimator, drastically improving its inference speed. We show that this self-supervised training exceeds the performance of existing zero-shot detection methods on two widely used object pose estimation and detection datasets, without requiring any human annotations. Further, we show that the resulting method for pose estimation has a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
