IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation
Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi

TL;DR
IST-Net introduces a prior-free implicit space transformation approach for category-level 6D pose estimation, emphasizing the importance of explicit deformation over prior models, achieving state-of-the-art results efficiently.
Contribution
The paper proposes a novel prior-free implicit space transformation network, IST-Net, that effectively estimates poses without relying on 3D priors, simplifying the process and improving speed.
Findings
Achieves state-of-the-art performance on REAL275 benchmark.
Operates with higher inference speed compared to prior methods.
Demonstrates that explicit deformation is more crucial than 3D priors.
Abstract
Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category. Thanks to prior deformation, which explicitly adapts a category-specific 3D prior (i.e., a 3D template) to a given object instance, prior-based methods attained great success and have become a major research stream. However, obtaining category-specific priors requires collecting a large amount of 3D models, which is labor-consuming and often not accessible in practice. This motivates us to investigate whether priors are necessary to make prior-based methods effective. Our empirical study shows that the 3D prior itself is not the credit to the high performance. The keypoint actually is the explicit deformation process, which aligns camera and world coordinates supervised by world-space 3D models (also called canonical space). Inspired by these observations, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Anatomy and Medical Technology
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
