Diffusion Features for Zero-Shot 6DoF Object Pose Estimation

Bernd Von Gimborn; Philipp Ausserlechner; Markus Vincze; Stefan; Thalhammer

arXiv:2411.16668·cs.CV·November 26, 2024

Diffusion Features for Zero-Shot 6DoF Object Pose Estimation

Bernd Von Gimborn, Philipp Ausserlechner, Markus Vincze, Stefan, Thalhammer

PDF

Open Access

TL;DR

This paper investigates the use of Latent Diffusion Model backbones for zero-shot 6DoF object pose estimation, showing significant improvements over Vision Transformer baselines across standard datasets.

Contribution

It introduces a novel approach using LDM backbones for zero-shot pose estimation and compares it with ViT models on a common framework.

Findings

01

Up to 27% improvement in Average Recall over ViT baseline

02

Effective adaptation of LDMs for pose estimation tasks

03

Empirical validation on three standard datasets

Abstract

Zero-shot object pose estimation enables the retrieval of object poses from images without necessitating object-specific training. In recent approaches this is facilitated by vision foundation models (VFM), which are pre-trained models that are effectively general-purpose feature extractors. The characteristics exhibited by these VFMs vary depending on the training data, network architecture, and training paradigm. The prevailing choice in this field are self-supervised Vision Transformers (ViT). This study assesses the influence of Latent Diffusion Model (LDM) backbones on zero-shot pose estimation. In order to facilitate a comparison between the two families of models on a common ground we adopt and modify a recent approach. Therefore, a template-based multi-staged method for estimating poses in a zero-shot fashion using LDMs is presented. The efficacy of the proposed approach is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Object Detection Techniques · Advanced Vision and Imaging · Robot Manipulation and Learning

MethodsLatent Diffusion Model · Diffusion · ADaptive gradient method with the OPTimal convergence rate