FAST GDRNPP: Improving the Speed of State-of-the-Art 6D Object Pose   Estimation

Thomas P\"ollabauer; Ashwin Pramod; Volker Knauthe; Michael Wahl

arXiv:2409.12720·cs.CV·September 20, 2024·2 cites

FAST GDRNPP: Improving the Speed of State-of-the-Art 6D Object Pose Estimation

Thomas P\"ollabauer, Ashwin Pramod, Volker Knauthe, Michael Wahl

PDF

Open Access

TL;DR

This paper presents FAST GDRNPP, a method that significantly accelerates 6D object pose estimation models by employing model compression techniques, achieving high accuracy with faster inference suitable for industrial applications.

Contribution

The paper introduces a novel approach to speed up GDRNPP by using smaller backbones, pruning, and distillation, maintaining accuracy while reducing inference time.

Findings

01

Maintains accuracy comparable to state-of-the-art models.

02

Significantly improves inference speed.

03

Enables practical deployment in industrial scenarios.

Abstract

6D object pose estimation involves determining the three-dimensional translation and rotation of an object within a scene and relative to a chosen coordinate system. This problem is of particular interest for many practical applications in industrial tasks such as quality control, bin picking, and robotic manipulation, where both speed and accuracy are critical for real-world deployment. Current models, both classical and deep-learning-based, often struggle with the trade-off between accuracy and latency. Our research focuses on enhancing the speed of a prominent state-of-the-art deep learning model, GDRNPP, while keeping its high accuracy. We employ several techniques to reduce the model size and improve inference time. These techniques include using smaller and quicker backbones, pruning unnecessary parameters, and distillation to transfer knowledge from a large, high-performing model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Advanced Vision and Imaging

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings