Mask6D: Masked Pose Priors For 6D Object Pose Estimation
Yuechen Xie, Haobo Jiang, Jin Xie

TL;DR
Mask6D introduces a novel pre-training strategy for 6D object pose estimation that leverages pose-aware correspondence and mask maps to improve robustness in cluttered scenes, outperforming previous methods.
Contribution
The paper proposes Mask6D, a pose estimation-specific pre-training approach using 2D-3D correspondence and visible masks to enhance feature discrimination and robustness.
Findings
Outperforms previous end-to-end pose estimation methods.
Effective in cluttered and occluded scenes.
Utilizes pose-aware correspondence maps and mask guidance.
Abstract
Robust 6D object pose estimation in cluttered or occluded conditions using monocular RGB images remains a challenging task. One reason is that current pose estimation networks struggle to extract discriminative, pose-aware features using 2D feature backbones, especially when the available RGB information is limited due to target occlusion in cluttered scenes. To mitigate this, we propose a novel pose estimation-specific pre-training strategy named Mask6D. Our approach incorporates pose-aware 2D-3D correspondence maps and visible mask maps as additional modal information, which is combined with RGB images for the reconstruction-based model pre-training. Essentially, this 2D-3D correspondence maps a transformed 3D object model to 2D pixels, reflecting the pose information of the target in camera coordinate system. Meanwhile, the integrated visible mask map can effectively guide our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
