iSPA-Net: Iterative Semantic Pose Alignment Network

Jogendra Nath Kundu; Aditya Ganeshan; Rahul M. V.; Aditya Prakash; R.; Venkatesh Babu

arXiv:1808.01134·cs.CV·August 6, 2018·1 cites

iSPA-Net: Iterative Semantic Pose Alignment Network

Jogendra Nath Kundu, Aditya Ganeshan, Rahul M. V., Aditya Prakash, R., Venkatesh Babu

PDF

Open Access 2 Repos

TL;DR

iSPA-Net is an iterative deep learning framework that improves 3D object pose estimation from monocular images by leveraging semantic 3D structure, reducing data requirements, and refining predictions through iterative pose alignment.

Contribution

The paper introduces iSPA-Net, a novel iterative pose alignment network that exploits semantic 3D structure and correspondence for fine-grained pose estimation with minimal annotations.

Findings

01

Achieves state-of-the-art performance on real image viewpoint datasets.

02

Effectively refines pose estimates through iterative alignment.

03

Demonstrates applications in active viewpoint localization and unsupervised part segmentation.

Abstract

Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also alleviates the problem of data scarcity and hence enhances scalability of the proposed approach for novel object categories with minimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Robotics and Sensor-Based Localization