HybridPose: 6D Object Pose Estimation under Hybrid Representations
Chen Song, Jiaru Song, Qixing Huang

TL;DR
HybridPose introduces a hybrid intermediate representation for 6D object pose estimation, combining multiple geometric cues to improve robustness and accuracy, especially under occlusion, while maintaining real-time performance.
Contribution
It proposes a novel hybrid representation that leverages diverse geometric features and a robust regression module, enhancing pose estimation accuracy and robustness over prior unitary approaches.
Findings
Achieves 47.5% mean ADD(-S) accuracy on Occlusion Linemod.
Runs at 30 fps, comparable to state-of-the-art methods.
Effectively handles occlusion and outliers in intermediate representations.
Abstract
We introduce HybridPose, a novel 6D object pose estimation approach. HybridPose utilizes a hybrid intermediate representation to express different geometric information in the input image, including keypoints, edge vectors, and symmetry correspondences. Compared to a unitary representation, our hybrid representation allows pose regression to exploit more and diverse features when one type of predicted representation is inaccurate (e.g., because of occlusion). Different intermediate representations used by HybridPose can all be predicted by the same simple neural network, and outliers in predicted intermediate representations are filtered by a robust regression module. Compared to state-of-the-art pose estimation approaches, HybridPose is comparable in running time and accuracy. For example, on Occlusion Linemod dataset, our method achieves a prediction speed of 30 fps with a mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
HybridPose: 6D Object Pose Estimation Under Hybrid Representations· youtube
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robot Manipulation and Learning · Image and Object Detection Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
