Yolo-Key-6D: Single Stage Monocular 6D Pose Estimation with Keypoint Enhancements
Kemal Alperen \c{C}etiner, Haz{\i}m Kemal Ekenel

TL;DR
Yolo-Key-6D is a fast, single-stage monocular 6D pose estimation framework that enhances YOLO with keypoint detection and continuous rotation regression, achieving high accuracy and real-time performance.
Contribution
The paper introduces a novel end-to-end single-stage approach combining keypoint detection and continuous rotation regression for improved 6D pose estimation.
Findings
Achieves 96.24% accuracy on LINEMOD benchmark.
Operates in real time with competitive accuracy.
Effectively balances speed and precision for practical deployment.
Abstract
Estimating the 6D pose of objects from a single RGB image is a critical task for robotics and extended reality applications. However, state-of-the-art multi stage methods often suffer from high latency, making them unsuitable for real time use. In this paper, we present Yolo-Key-6D, a novel single stage, end-to-end framework for monocular 6D pose estimation designed for both speed and accuracy. Our approach enhances a YOLO based architecture by integrating an auxiliary head that regresses the 2D projections of an object's 3D bounding box corners. This keypoint detection task significantly improves the network's understanding of 3D geometry. For stable end-to-end training, we directly regress rotation using a continuous 9D representation projected to SO(3) via singular value decomposition. On the LINEMOD and LINEMOD-Occluded benchmarks, YOLO-Key-6D achieves competitive accuracy scores of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
