Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation
Yang Hai, Rui Song, Jiaojiao Li, David Ferstl, Yinlin Hu

TL;DR
This paper introduces a self-supervised 6D object pose estimation method that relies solely on RGB images, using a novel geometry constraint based on pixel flow consistency, eliminating the need for depth or 2D annotations.
Contribution
The proposed approach enables accurate 6D pose estimation from RGB images alone by leveraging a geometry-based refinement strategy with pseudo labels, advancing self-supervised techniques.
Findings
Outperforms state-of-the-art self-supervised methods on three datasets
Operates without 2D annotations or depth information
Achieves robust pose estimation from purely RGB data
Abstract
Most self-supervised 6D object pose estimation methods can only work with additional depth information or rely on the accurate annotation of 2D segmentation masks, limiting their application range. In this paper, we propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information. We first obtain a rough pose initialization from networks trained on synthetic images rendered from the target's 3D mesh. Then, we introduce a refinement strategy leveraging the geometry constraint in synthetic-to-real image pairs from multiple different views. We formulate this geometry constraint as pixel-level flow consistency between the training images with dynamically generated pseudo labels. We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly, with neither 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
