ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers
Philipp Ausserlechner, David Haberger, Stefan Thalhammer,, Jean-Baptiste Weibel, Markus Vincze

TL;DR
This paper introduces ZS6D, a zero-shot 6D object pose estimation method using pre-trained Vision Transformers, eliminating the need for task-specific fine-tuning and achieving superior results on multiple datasets.
Contribution
ZS6D leverages pre-trained Vision Transformers for zero-shot 6D pose estimation, avoiding expensive training and fine-tuning procedures of prior methods.
Findings
Improves Average Recall on LMO, YCBV, and TLESS datasets.
Outperforms state-of-the-art methods without task-specific fine-tuning.
Uses ViT descriptors for robust template matching and pose estimation.
Abstract
As robotic systems increasingly encounter complex and unconstrained real-world scenarios, there is a demand to recognize diverse objects. The state-of-the-art 6D object pose estimation methods rely on object-specific training and therefore do not generalize to unseen objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template matching. This adaptation for pose estimation still requires expensive data rendering and training procedures. MegaPose for example is trained on a dataset consisting of two million images showing 20,000 different objects to reach such generalization capabilities. To overcome this shortcoming we introduce ZS6D, for zero-shot novel object 6D pose estimation. Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates against query images of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
MethodsPnP
