Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation
Jian Liu, Wei Sun, Hui Yang, Pengchao Deng, Chongpei Liu, Nicu Sebe,, Hossein Rahmani, Ajmal Mian

TL;DR
This paper introduces Diff9D, a diffusion-based approach for category-level 9-DoF object pose estimation that generalizes well to real-world scenes using synthetic training data, achieving near real-time performance and state-of-the-art results.
Contribution
The paper proposes a novel diffusion model framework for domain-generalized 9-DoF object pose estimation trained solely on synthetic data, eliminating the need for 3D shape priors.
Findings
Achieves state-of-the-art domain generalization performance on benchmark datasets.
Operates in as few as 3 reverse diffusion steps for near real-time inference.
Demonstrates effectiveness in a real-world robotic grasping system.
Abstract
Nine-degrees-of-freedom (9-DoF) object pose and size estimation is crucial for enabling augmented reality and robotic manipulation. Category-level methods have received extensive research attention due to their potential for generalization to intra-class unknown objects. However, these methods require manual collection and labeling of large-scale real-world training data. To address this problem, we introduce a diffusion-based paradigm for domain-generalized category-level 9-DoF object pose estimation. Our motivation is to leverage the latent generalization ability of the diffusion model to address the domain generalization challenge in object pose estimation. This entails training the model exclusively on rendered synthetic data to achieve generalization to real-world scenes. We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Image Processing Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Diffusion
