Mask6D: Masked Pose Priors For 6D Object Pose Estimation

Yuechen Xie; Haobo Jiang; Jin Xie

arXiv:2507.06486·cs.CV·July 10, 2025

Mask6D: Masked Pose Priors For 6D Object Pose Estimation

Yuechen Xie, Haobo Jiang, Jin Xie

PDF

TL;DR

Mask6D introduces a novel pre-training strategy for 6D object pose estimation that leverages pose-aware correspondence and mask maps to improve robustness in cluttered scenes, outperforming previous methods.

Contribution

The paper proposes Mask6D, a pose estimation-specific pre-training approach using 2D-3D correspondence and visible masks to enhance feature discrimination and robustness.

Findings

01

Outperforms previous end-to-end pose estimation methods.

02

Effective in cluttered and occluded scenes.

03

Utilizes pose-aware correspondence maps and mask guidance.

Abstract

Robust 6D object pose estimation in cluttered or occluded conditions using monocular RGB images remains a challenging task. One reason is that current pose estimation networks struggle to extract discriminative, pose-aware features using 2D feature backbones, especially when the available RGB information is limited due to target occlusion in cluttered scenes. To mitigate this, we propose a novel pose estimation-specific pre-training strategy named Mask6D. Our approach incorporates pose-aware 2D-3D correspondence maps and visible mask maps as additional modal information, which is combined with RGB images for the reconstruction-based model pre-training. Essentially, this 2D-3D correspondence maps a transformed 3D object model to 2D pixels, reflecting the pose information of the target in camera coordinate system. Meanwhile, the integrated visible mask map can effectively guide our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.