Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From Monocular RGB Image
Zhaoxin Fan, Zhenbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan, Liu, Jun He

TL;DR
This paper introduces OLD-Net, a novel RGB-only approach for category-level 6D object pose estimation that predicts object-level depth directly from monocular images, eliminating the need for depth sensors.
Contribution
The paper proposes a new RGB-based method with two innovative modules for high-fidelity depth and shape reconstruction, advancing category-level 6D pose estimation.
Findings
Achieves state-of-the-art results on CAMERA25 and REAL275 datasets.
Effectively predicts object-level depth and shape from monocular RGB images.
Outperforms existing RGBD-based methods despite using only RGB input.
Abstract
Recently, RGBD-based category-level 6D object pose estimation has achieved promising improvement in performance, however, the requirement of depth information prohibits broader applications. In order to relieve this problem, this paper proposes a novel approach named Object Level Depth reconstruction Network (OLD-Net) taking only RGB images as input for category-level 6D object pose estimation. We propose to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into object-level depth and the canonical NOCS representation. Two novel modules named Normalized Global Position Hints (NGPH) and Shape-aware Decoupled Depth Reconstruction (SDDR) module are introduced to learn high fidelity object-level depth and delicate shape representations. At last, the 6D object pose is solved by aligning the predicted canonical representation with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Industrial Vision Systems and Defect Detection · Human Pose and Action Recognition
