A Visual Navigation Perspective for Category-Level Object Pose Estimation
Jiaxin Guo, Fangxun Zhong, Rong Xiong, Yunhui Liu, Yue Wang, Yiyi Liao

TL;DR
This paper explores how visual navigation strategies can improve category-level object pose estimation using generative models, focusing on inference efficiency, robustness, and convergence.
Contribution
It investigates navigation policies for analysis-by-synthesis pose estimation and introduces a hybrid approach that outperforms existing strategies.
Findings
Hybrid navigation approach improves convergence and robustness.
Evaluation shows superior performance over state-of-the-art methods.
Analysis of different navigation strategies informs better inference in pose estimation.
Abstract
This paper studies category-level object pose estimation based on a single monocular image. Recent advances in pose-aware generative models have paved the way for addressing this challenging task using analysis-by-synthesis. The idea is to sequentially update a set of latent variables, e.g., pose, shape, and appearance, of the generative model until the generated image best agrees with the observation. However, convergence and efficiency are two challenges of this inference procedure. In this paper, we take a deeper look at the inference of analysis-by-synthesis from the perspective of visual navigation, and investigate what is a good navigation policy for this specific task. We evaluate three different strategies, including gradient descent, reinforcement learning and imitation learning, via thorough comparisons in terms of convergence, robustness and efficiency. Moreover, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Multimodal Machine Learning Applications · Human Pose and Action Recognition
