Self-supervised Vision Transformers for 3D Pose Estimation of Novel   Objects

Stefan Thalhammer; Jean-Baptiste Weibel; Markus Vincze; Jose; Garcia-Rodriguez

arXiv:2306.00129·cs.CV·June 2, 2023·2 cites

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Stefan Thalhammer, Jean-Baptiste Weibel, Markus Vincze, Jose, Garcia-Rodriguez

PDF

Open Access 1 Repo

TL;DR

This paper compares self-supervised Vision Transformers and CNNs for 3D object pose estimation of unseen objects, showing Transformers often outperform CNNs in matching accuracy without fine-tuning.

Contribution

It evaluates and demonstrates the advantages of Vision Transformers over CNNs in deep template matching for novel object pose estimation, including cases where fine-tuning is unnecessary.

Findings

01

Vision Transformers improve matching accuracy over CNNs.

02

Pre-trained Vision Transformers can perform well without fine-tuning.

03

Transformers show different optimization and architectural benefits.

Abstract

Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitly provides object class and pose. Despite the recent success and improvements of Vision Transformers over CNNs for many vision tasks, the state of the art uses CNN-based approaches for novel object pose estimation. This work evaluates and demonstrates the differences between self-supervised CNNs and Vision Transformers for deep template matching. In detail, both types of approaches are trained using contrastive learning to match training images against rendered templates of isolated objects. At…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sthalham/tram3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning

MethodsTest · Contrastive Learning