RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery
Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li, Pan, Ji

TL;DR
This paper introduces a novel RGB-only object pose estimation method that decouples pose and size estimation, effectively addressing scale ambiguity and improving accuracy over previous RGB-based approaches.
Contribution
It proposes a decoupled pipeline using a monocular estimator for geometry and a separate branch for scale recovery, enhancing pose estimation accuracy without depth sensors.
Findings
Outperforms previous RGB-based methods in rotation accuracy
Demonstrates robustness on synthetic and real datasets
Effectively mitigates scale ambiguity in monocular pose estimation
Abstract
While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information, mainly facilitating the search for inlier 2D-3D correspondence. Meanwhile, a separate branch is designed to directly recover the metric scale of the object based on category-level statistics. Finally, we advocate using the RANSAC-PP algorithm to robustly solve for 6D object pose. Extensive experiments have been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Robotics and Sensor-Based Localization
