DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation
Yilin Wen, Xiangyu Li, Hao Pan, Lei Yang, Zheng Wang, Taku Komura,, Wenping Wang

TL;DR
This paper introduces DISP6D, a scalable 6D pose estimation method that disentangles shape and pose representations in an auto-encoder framework, improving generalization and performance on diverse object datasets.
Contribution
It proposes a novel disentangled latent space for shape and pose, with shape-dependent pose codebooks, enabling scalable and accurate 6D pose estimation from RGB images.
Findings
Achieves state-of-the-art results on CAD object benchmarks.
Effectively handles object symmetry and category generalization.
Demonstrates improved scalability to daily objects across categories.
Abstract
Scalable 6D pose estimation for rigid objects from RGB images aims at handling multiple objects and generalizing to novel objects. Building on a well-known auto-encoding framework to cope with object symmetry and the lack of labeled training data, we achieve scalability by disentangling the latent representation of auto-encoder into shape and pose sub-spaces. The latent shape space models the similarity of different objects through contrastive metric learning, and the latent pose code is compared with canonical rotations for rotation retrieval. Because different object symmetries induce inconsistent latent pose spaces, we re-entangle the shape representation with canonical rotations to generate shape-dependent pose codebooks for rotation retrieval. We show state-of-the-art performance on two benchmarks containing textureless CAD objects without category and daily objects with categories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Robot Manipulation and Learning · 3D Surveying and Cultural Heritage
