FastPoseCNN: Real-Time Monocular Category-Level Pose and Size Estimation Framework
Eduardo Davalos, Mehran Aminian

TL;DR
FastPoseCNN introduces a real-time, efficient framework for monocular category-level pose and size estimation from a single RGB image, significantly improving inference speed over previous methods.
Contribution
The paper presents a novel, fast framework using ResNet-FPN and decoupled decoders for pose and size estimation, enabling real-time performance for unseen objects.
Findings
Achieves real-time inference at 25+ fps
Outperforms previous methods in accuracy and speed
Performs global context estimation for multiple objects
Abstract
The primary focus of this paper is the development of a framework for pose and size estimation of unseen objects given a single RGB image - all in real-time. In 2019, the first category-level pose and size estimation framework was proposed alongside two novel datasets called CAMERA and REAL. However, current methodologies are restricted from practical use because of its long inference time (2-4 fps). Their approach's inference had significant delays because they used the computationally expensive MaskedRCNN framework and Umeyama algorithm. To optimize our method and yield real-time results, our framework uses the efficient ResNet-FPN framework alongside decoupling the translation, rotation, and size regression problem by using distinct decoders. Moreover, our methodology performs pose and size estimation in a global context - i.e., estimating the involved parameters of all captured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Handwritten Text Recognition Techniques · Robot Manipulation and Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
