Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

Tom Fischer; Xiaojie Zhang; Eddy Ilg

arXiv:2508.02157·cs.CV·August 5, 2025

Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

Tom Fischer, Xiaojie Zhang, Eddy Ilg

PDF

Open Access

TL;DR

This paper presents a unified neural network framework for category-level object detection and 3D pose estimation from RGB images, achieving state-of-the-art accuracy and robustness without requiring depth data.

Contribution

It introduces the first unified model combining detection and pose estimation for RGB images using neural mesh models and multi-model RANSAC.

Findings

01

Achieves 22.9% improvement over previous methods on REAL275 dataset.

02

Demonstrates greater robustness compared to single-stage baselines.

03

Provides open-source code and models for the community.

Abstract

Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use separate models and representations for detection and pose estimation. For the first time, we introduce a unified model that integrates detection and pose estimation into a single framework for RGB images by leveraging neural mesh models with learned features and multi-model RANSAC. Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275, improving on the current state-of-the-art by 22.9% averaged across all scale-agnostic metrics. Finally, we demonstrate that our unified method exhibits greater robustness compared to single-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Human Pose and Action Recognition