6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation
Li Xu, Haoxuan Qu, Yujun Cai, Jun Liu

TL;DR
This paper introduces 6D-Diff, a diffusion-based framework that improves 6D object pose estimation from RGB images by modeling 2D keypoints detection as a reverse denoising process, leading to better handling of noise and occlusion.
Contribution
The paper presents a novel diffusion-based approach for 6D pose estimation, formulating keypoint detection as a reverse diffusion process conditioned on object features.
Findings
Effective in handling noise and occlusion.
Outperforms existing methods on LM-O and YCB-V datasets.
Demonstrates the potential of diffusion models in pose estimation.
Abstract
Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile, diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability, we propose a novel diffusion-based framework (6D-Diff) to handle the noise and indeterminacy in object pose estimation for better performance. In our framework, to establish accurate 2D-3D correspondence, we formulate 2D keypoints detection as a reverse diffusion (denoising) process. To facilitate such a denoising process, we design a Mixture-of-Cauchy-based forward diffusion process and condition the reverse process on the object features. Extensive experiments on the LM-O and YCB-V datasets demonstrate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Image and Object Detection Techniques · Advanced Vision and Imaging
MethodsDiffusion
