IST-Net: Prior-free Category-level Pose Estimation with Implicit Space   Transformation

Jianhui Liu; Yukang Chen; Xiaoqing Ye; Xiaojuan Qi

arXiv:2303.13479·cs.CV·July 20, 2023·1 cites

IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation

Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi

PDF

Open Access 1 Repo

TL;DR

IST-Net introduces a prior-free implicit space transformation approach for category-level 6D pose estimation, emphasizing the importance of explicit deformation over prior models, achieving state-of-the-art results efficiently.

Contribution

The paper proposes a novel prior-free implicit space transformation network, IST-Net, that effectively estimates poses without relying on 3D priors, simplifying the process and improving speed.

Findings

01

Achieves state-of-the-art performance on REAL275 benchmark.

02

Operates with higher inference speed compared to prior methods.

03

Demonstrates that explicit deformation is more crucial than 3D priors.

Abstract

Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category. Thanks to prior deformation, which explicitly adapts a category-specific 3D prior (i.e., a 3D template) to a given object instance, prior-based methods attained great success and have become a major research stream. However, obtaining category-specific priors requires collecting a large amount of 3D models, which is labor-consuming and often not accessible in practice. This motivates us to investigate whether priors are necessary to make prior-based methods effective. Our empirical study shows that the 3D prior itself is not the credit to the high performance. The keypoint actually is the explicit deformation process, which aligns camera and world coordinates supervised by world-space 3D models (also called canonical space). Inspired by these observations, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvmi-lab/ist-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Anatomy and Medical Technology

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings