DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D   Pose Estimation

Yilin Wen; Xiangyu Li; Hao Pan; Lei Yang; Zheng Wang; Taku Komura,; Wenping Wang

arXiv:2107.12549·cs.CV·March 13, 2023

DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation

Yilin Wen, Xiangyu Li, Hao Pan, Lei Yang, Zheng Wang, Taku Komura,, Wenping Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces DISP6D, a scalable 6D pose estimation method that disentangles shape and pose representations in an auto-encoder framework, improving generalization and performance on diverse object datasets.

Contribution

It proposes a novel disentangled latent space for shape and pose, with shape-dependent pose codebooks, enabling scalable and accurate 6D pose estimation from RGB images.

Findings

01

Achieves state-of-the-art results on CAD object benchmarks.

02

Effectively handles object symmetry and category generalization.

03

Demonstrates improved scalability to daily objects across categories.

Abstract

Scalable 6D pose estimation for rigid objects from RGB images aims at handling multiple objects and generalizing to novel objects. Building on a well-known auto-encoding framework to cope with object symmetry and the lack of labeled training data, we achieve scalability by disentangling the latent representation of auto-encoder into shape and pose sub-spaces. The latent shape space models the similarity of different objects through contrastive metric learning, and the latent pose code is compared with canonical rotations for rotation retrieval. Because different object symmetries induce inconsistent latent pose spaces, we re-entangle the shape representation with canonical rotations to generate shape-dependent pose codebooks for rotation retrieval. We show state-of-the-art performance on two benchmarks containing textureless CAD objects without category and daily objects with categories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fylwen/disp-6d
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Robot Manipulation and Learning · 3D Surveying and Cultural Heritage