Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes
Tom Fischer, Xiaojie Zhang, Eddy Ilg

TL;DR
This paper presents a unified neural network framework for category-level object detection and 3D pose estimation from RGB images, achieving state-of-the-art accuracy and robustness without requiring depth data.
Contribution
It introduces the first unified model combining detection and pose estimation for RGB images using neural mesh models and multi-model RANSAC.
Findings
Achieves 22.9% improvement over previous methods on REAL275 dataset.
Demonstrates greater robustness compared to single-stage baselines.
Provides open-source code and models for the community.
Abstract
Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use separate models and representations for detection and pose estimation. For the first time, we introduce a unified model that integrates detection and pose estimation into a single framework for RGB images by leveraging neural mesh models with learned features and multi-model RANSAC. Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275, improving on the current state-of-the-art by 22.9% averaged across all scale-agnostic metrics. Finally, we demonstrate that our unified method exhibits greater robustness compared to single-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Human Pose and Action Recognition
