TL;DR
This paper introduces a modular 6D pose estimation framework combining keypoint heatmap regression with RGB-D data, achieving high accuracy on LINEMOD and providing insights on keypoint strategies and fusion methods.
Contribution
It presents a novel RGB-D fusion architecture with cross-stage interaction and evaluates keypoint selection strategies for improved pose accuracy.
Findings
RGB-only model achieved 84.50% accuracy on LINEMOD.
RGB-D fusion model reached 92.41% accuracy on LINEMOD.
Incorporating depth data improves pose estimation performance.
Abstract
In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from these heatmaps are used to estimate the 6D object pose via the PnP RANSAC algorithm. We compare different keypoint selection strategies to assess their impact on pose accuracy. Additionally, we extend the baseline by incorporating depth data using a cross-fusion architecture, which enables interaction between RGB and depth features at multiple stages. We further explore general training improvements, such as experimenting with activation functions and learning rate scheduling strategies to improve model performance. Our best RGB-only model achieved a mean ADD-based accuracy of 84.50%, while the RGB-D fusion model reached 92.41% on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
