DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation
Ivan Shugurov, Sergey Zakharov, Slobodan Ilic

TL;DR
DPODv2 introduces a multi-modal, dense correspondence-based 6 DoF pose estimation method with a novel differentiable rendering refinement, achieving state-of-the-art results across various data types.
Contribution
It presents a unified deep learning framework for RGB and depth data, incorporating a novel differentiable rendering-based pose refinement technique.
Findings
RGB excels in correspondence estimation
Depth improves pose accuracy with good 3D correspondences
Combining RGB and depth yields the best overall performance
Abstract
We propose a three-stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector) that relies on dense correspondences. We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose. Unlike other deep learning methods that are typically restricted to monocular RGB images, we propose a unified deep learning network allowing different imaging modalities to be used (RGB or Depth). Moreover, we propose a novel pose refinement method, that is based on differentiable rendering. The main concept is to compare predicted and rendered correspondences in multiple views to obtain a pose which is consistent with predicted correspondences in all views. Our proposed method is evaluated rigorously on different data modalities and types of training data in a controlled setup. The main conclusions is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
