Viewpoints and Keypoints
Shubham Tulsiani, Jitendra Malik

TL;DR
This paper introduces CNN architectures for pose estimation and keypoint prediction, demonstrating that incorporating viewpoint information significantly enhances accuracy in object localization and pose estimation tasks.
Contribution
The paper presents novel CNN-based methods for pose and keypoint estimation, effectively leveraging viewpoint information to improve performance in both constrained and detection settings.
Findings
Viewpoint estimates improve keypoint prediction accuracy.
Achieved state-of-the-art results in pose and keypoint estimation.
Analyzed error modes and object characteristics affecting performance.
Abstract
We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details. We address both these tasks in two different settings - the constrained setting with known bounding boxes and the more challenging detection setting where the aim is to simultaneously detect and correctly estimate pose of objects. We present Convolutional Neural Network based architectures for these and demonstrate that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions. In addition to achieving significant improvements over state-of-the-art in the above tasks, we analyze the error modes and effect of object characteristics on performance to guide future efforts towards this goal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
