ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers

Philipp Ausserlechner; David Haberger; Stefan Thalhammer,; Jean-Baptiste Weibel; Markus Vincze

arXiv:2309.11986·cs.CV·September 22, 2023·1 cites

ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers

Philipp Ausserlechner, David Haberger, Stefan Thalhammer,, Jean-Baptiste Weibel, Markus Vincze

PDF

Open Access

TL;DR

This paper introduces ZS6D, a zero-shot 6D object pose estimation method using pre-trained Vision Transformers, eliminating the need for task-specific fine-tuning and achieving superior results on multiple datasets.

Contribution

ZS6D leverages pre-trained Vision Transformers for zero-shot 6D pose estimation, avoiding expensive training and fine-tuning procedures of prior methods.

Findings

01

Improves Average Recall on LMO, YCBV, and TLESS datasets.

02

Outperforms state-of-the-art methods without task-specific fine-tuning.

03

Uses ViT descriptors for robust template matching and pose estimation.

Abstract

As robotic systems increasingly encounter complex and unconstrained real-world scenarios, there is a demand to recognize diverse objects. The state-of-the-art 6D object pose estimation methods rely on object-specific training and therefore do not generalize to unseen objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template matching. This adaptation for pose estimation still requires expensive data rendering and training procedures. MegaPose for example is trained on a dataset consisting of two million images showing 20,000 different objects to reach such generalization capabilities. To overcome this shortcoming we introduce ZS6D, for zero-shot novel object 6D pose estimation. Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates against query images of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Robotics and Sensor-Based Localization

MethodsPnP