iSPA-Net: Iterative Semantic Pose Alignment Network
Jogendra Nath Kundu, Aditya Ganeshan, Rahul M. V., Aditya Prakash, R., Venkatesh Babu

TL;DR
iSPA-Net is an iterative deep learning framework that improves 3D object pose estimation from monocular images by leveraging semantic 3D structure, reducing data requirements, and refining predictions through iterative pose alignment.
Contribution
The paper introduces iSPA-Net, a novel iterative pose alignment network that exploits semantic 3D structure and correspondence for fine-grained pose estimation with minimal annotations.
Findings
Achieves state-of-the-art performance on real image viewpoint datasets.
Effectively refines pose estimates through iterative alignment.
Demonstrates applications in active viewpoint localization and unsupervised part segmentation.
Abstract
Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also alleviates the problem of data scarcity and hence enhances scalability of the proposed approach for novel object categories with minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
