Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object   Pose Estimation

Jian Liu; Wei Sun; Hui Yang; Pengchao Deng; Chongpei Liu; Nicu Sebe,; Hossein Rahmani; Ajmal Mian

arXiv:2502.02525·cs.CV·March 18, 2025

Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation

Jian Liu, Wei Sun, Hui Yang, Pengchao Deng, Chongpei Liu, Nicu Sebe,, Hossein Rahmani, Ajmal Mian

PDF

Open Access 1 Repo

TL;DR

This paper introduces Diff9D, a diffusion-based approach for category-level 9-DoF object pose estimation that generalizes well to real-world scenes using synthetic training data, achieving near real-time performance and state-of-the-art results.

Contribution

The paper proposes a novel diffusion model framework for domain-generalized 9-DoF object pose estimation trained solely on synthetic data, eliminating the need for 3D shape priors.

Findings

01

Achieves state-of-the-art domain generalization performance on benchmark datasets.

02

Operates in as few as 3 reverse diffusion steps for near real-time inference.

03

Demonstrates effectiveness in a real-world robotic grasping system.

Abstract

Nine-degrees-of-freedom (9-DoF) object pose and size estimation is crucial for enabling augmented reality and robotic manipulation. Category-level methods have received extensive research attention due to their potential for generalization to intra-class unknown objects. However, these methods require manual collection and labeling of large-scale real-world training data. To address this problem, we introduce a diffusion-based paradigm for domain-generalized category-level 9-DoF object pose estimation. Our motivation is to leverage the latent generalization ability of the diffusion model to address the domain generalization challenge in object pose estimation. This entails training the model exclusively on rendered synthetic data to achieve generalization to real-world scenes. We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cnjianliu/diff9d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Image Processing Techniques and Applications

MethodsSoftmax · Attention Is All You Need · Diffusion