TL;DR
This paper introduces a novel 6D object pose estimation method that relies on synthetic images and minimal real data, using a two-network approach to refine pose estimates with high robustness to domain shift.
Contribution
It presents a new approach that trains solely on synthetic images or minimal real data, significantly reducing the need for extensive real-world annotations in 6D pose estimation.
Findings
Performs comparably to fully supervised methods without real images
Outperforms existing methods when using only twenty real images
Less sensitive to domain shift between synthetic and real images
Abstract
Most recent 6D object pose estimation methods, including unsupervised ones, require many real training images. Unfortunately, for some applications, such as those in space or deep under water, acquiring real images, even unannotated, is virtually impossible. In this paper, we propose a method that can be trained solely on synthetic images, or optionally using a few additional real ones. Given a rough pose estimate obtained from a first network, it uses a second network to predict a dense 2D correspondence field between the image rendered using the rough pose and the real image and infers the required pose correction. This approach is much less sensitive to the domain shift between synthetic and real images than state-of-the-art methods. It performs on par with methods that require annotated real images for training when not using any, and outperforms them considerably when using as few…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
