RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance
Sheng Yu, Di-Hua Zhai, Yuanqing Xia

TL;DR
This paper introduces RCGNet, a transformer-based neural network that estimates 6D object pose from RGB images alone, using geometric guidance and RANSAC-PnP for improved accuracy and efficiency in real-world scenarios.
Contribution
The paper presents a novel RGB-only category-level pose estimation method with a geometric feature-guided algorithm and transformer architecture, eliminating the need for depth data.
Findings
Achieves superior accuracy over previous RGB-based methods
Demonstrates high efficiency on benchmark datasets
Effectively handles variable object scales
Abstract
While most current RGB-D-based category-level object pose estimation methods achieve strong performance, they face significant challenges in scenes lacking depth information. In this paper, we propose a novel category-level object pose estimation approach that relies solely on RGB images. This method enables accurate pose estimation in real-world scenarios without the need for depth data. Specifically, we design a transformer-based neural network for category-level object pose estimation, where the transformer is employed to predict and fuse the geometric features of the target object. To ensure that these predicted geometric features faithfully capture the object's geometry, we introduce a geometric feature-guided algorithm, which enhances the network's ability to effectively represent the object's geometric information. Finally, we utilize the RANSAC-PnP algorithm to compute the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
