TL;DR
This paper introduces a self-supervised monocular 6D object pose estimation method that eliminates the need for real annotations by leveraging synthetic data and differentiable rendering, improving robustness to occlusion.
Contribution
It proposes a novel self-supervised learning framework for 6D pose estimation that uses synthetic data and differentiable rendering, reducing annotation requirements and enhancing occlusion robustness.
Findings
Outperforms methods relying on synthetic data or domain adaptation techniques.
Improves over its synthetically trained baseline.
Nearly closes the gap to fully supervised methods.
Abstract
6D object pose estimation is a fundamental yet challenging problem in computer vision. Convolutional Neural Networks (CNNs) have recently proven to be capable of predicting reliable 6D pose estimates even under monocular settings. Nonetheless, CNNs are identified as being extremely data-driven, and acquiring adequate annotations is oftentimes very time-consuming and labor intensive. To overcome this limitation, we propose a novel monocular 6D pose estimation approach by means of self-supervised learning, removing the need for real annotations. After training our proposed network fully supervised with synthetic RGB data, we leverage current trends in noisy student training and differentiable rendering to further self-supervise the model on these unsupervised real RGB(-D) samples, seeking for a visually and geometrically optimal alignment. Moreover, employing both visible and amodal mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout · Stochastic Depth · RandAugment · Noisy Student
