Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion
Adam Bethell, Ravi Garg, Ian Reid

TL;DR
This paper presents a novel category-level 6D object pose estimation method from a single RGB image using diffusion models, eliminating the need for specific object models or depth data, and achieving state-of-the-art results.
Contribution
It introduces a diffusion-based approach combined with Mean Shift for robust pose estimation without requiring object models or depth information.
Findings
Outperforms current state-of-the-art on REAL275 dataset
Eliminates need for object-specific models or depth data
Uses diffusion models and Mean Shift for pose estimation
Abstract
Estimating the 6D pose and 3D size of an object from an image is a fundamental task in computer vision. Most current approaches are restricted to specific instances with known models or require ground truth depth information or point cloud captures from LIDAR. We tackle the harder problem of pose estimation for category-level objects from a single RGB image. We propose a novel solution that eliminates the need for specific object models or depth information. Our method utilises score-based diffusion models to generate object pose hypotheses to model the distribution of possible poses for the object. Unlike previous methods that rely on costly trained likelihood estimators to remove outliers before pose aggregation using mean pooling, we introduce a simpler approach using Mean Shift to estimate the mode of the distribution as the final pose estimate. Our approach outperforms the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Image and Object Detection Techniques · Advanced Vision and Imaging
MethodsDiffusion
