Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation
Haopeng Zhang, Tarek El-Gaaly, Ahmed Elgammal, Zhiguo Jiang

TL;DR
This paper introduces a novel framework that models and factorizes view-object manifolds to jointly address object recognition, instance identification, and pose estimation, achieving state-of-the-art results on challenging datasets.
Contribution
It proposes a new method to explicitly model deformations of object manifolds and factorize them in a view-invariant space for improved recognition and pose estimation.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively models manifold deformations for recognition.
Jointly solves recognition, instance, and pose estimation.
Abstract
Due to large variations in shape, appearance, and viewing conditions, object recognition is a key precursory challenge in the fields of object manipulation and robotic/AI visual reasoning in general. Recognizing object categories, particular instances of objects and viewpoints/poses of objects are three critical subproblems robots must solve in order to accurately grasp/manipulate objects and reason about their environments. Multi-view images of the same object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g. visual/depth descriptor spaces). These object manifolds share the same topology despite being geometrically different. Each object manifold can be represented as a deformed version of a unified manifold. The object manifolds can thus be parameterized by its homeomorphic mapping/reconstruction from the unified manifold. In this work, we develop a novel framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
