FastPoseCNN: Real-Time Monocular Category-Level Pose and Size Estimation   Framework

Eduardo Davalos; Mehran Aminian

arXiv:2406.11063·cs.CV·June 18, 2024

FastPoseCNN: Real-Time Monocular Category-Level Pose and Size Estimation Framework

Eduardo Davalos, Mehran Aminian

PDF

Open Access 1 Repo

TL;DR

FastPoseCNN introduces a real-time, efficient framework for monocular category-level pose and size estimation from a single RGB image, significantly improving inference speed over previous methods.

Contribution

The paper presents a novel, fast framework using ResNet-FPN and decoupled decoders for pose and size estimation, enabling real-time performance for unseen objects.

Findings

01

Achieves real-time inference at 25+ fps

02

Outperforms previous methods in accuracy and speed

03

Performs global context estimation for multiple objects

Abstract

The primary focus of this paper is the development of a framework for pose and size estimation of unseen objects given a single RGB image - all in real-time. In 2019, the first category-level pose and size estimation framework was proposed alongside two novel datasets called CAMERA and REAL. However, current methodologies are restricted from practical use because of its long inference time (2-4 fps). Their approach's inference had significant delays because they used the computationally expensive MaskedRCNN framework and Umeyama algorithm. To optimize our method and yield real-time results, our framework uses the efficient ResNet-FPN framework alongside decoupling the translation, rotation, and size regression problem by using distinct decoders. Moreover, our methodology performs pose and size estimation in a global context - i.e., estimating the involved parameters of all captured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edavalosanaya/FastPoseCNN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Handwritten Text Recognition Techniques · Robot Manipulation and Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus