ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation
Catherine Capellen, Max Schwarz, Sven Behnke

TL;DR
ConvPoseCNN is a fully convolutional network for 6D object pose estimation that predicts dense pixel-wise translation and orientation, achieving faster training with fewer parameters while maintaining accuracy.
Contribution
It introduces a novel dense prediction approach for 6D pose estimation using a fully convolutional architecture without object cropping.
Findings
Fewer parameters and faster training compared to similar methods.
Achieves accurate pose estimation on YCB-Video Dataset.
Implicitly learns to focus on reliable object regions.
Abstract
6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation components of the object pose, where the dense orientation is represented in Quaternion form. We present different approaches for aggregation of the dense orientation predictions, including averaging and clustering schemes. We evaluate ConvPoseCNN on the challenging YCB-Video Dataset, where we show that the approach has far fewer parameters and trains faster than comparable methods without sacrificing accuracy. Furthermore, our results indicate that the dense orientation prediction implicitly learns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
