Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image
Thanh-Toan Do, Ming Cai, Trung Pham, Ian Reid

TL;DR
Deep-6DPose is an end-to-end deep learning framework that detects objects, segments them, and directly estimates their 6D poses from a single RGB image, enabling fast and accurate robotic perception.
Contribution
It extends Mask R-CNN with a novel pose estimation branch that directly regresses 6D poses using a Lie algebra representation, eliminating the need for post-refinement.
Findings
Outperforms state-of-the-art RGB-based pose estimation methods.
Operates at 10 fps, suitable for real-time robotic applications.
Demonstrates effective 6D pose recovery on benchmark datasets.
Abstract
Detecting objects and their 6D poses from only RGB images is an important task for many robotic applications. While deep learning methods have made significant progress in visual object detection and segmentation, the object pose estimation task is still challenging. In this paper, we introduce an end-toend deep learning framework, named Deep-6DPose, that jointly detects, segments, and most importantly recovers 6D poses of object instances from a single RGB image. In particular, we extend the recent state-of-the-art instance segmentation network Mask R-CNN with a novel pose estimation branch to directly regress 6D object poses without any post-refinements. Our key technical contribution is the decoupling of pose parameters into translation and rotation so that the rotation can be regressed via a Lie algebra representation. The resulting pose regression loss is differential and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Robot Manipulation and Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Region Proposal Network · Softmax · Convolution · RoIAlign · Mask R-CNN
