Coarse-to-Fine Multi-Scene Pose Regression with Transformers

Yoli Shavit; Ron Ferens; Yosi Keller

arXiv:2308.11783·cs.CV·August 24, 2023

Coarse-to-Fine Multi-Scene Pose Regression with Transformers

Yoli Shavit, Ron Ferens, Yosi Keller

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Transformer-based approach for multi-scene camera pose regression, enabling the model to focus on relevant features and outperform existing methods on standard benchmarks.

Contribution

The work presents a novel Transformer architecture with mixed classification-regression for multi-scene pose estimation, improving accuracy over prior models.

Findings

01

Outperforms state-of-the-art single-scene regressors

02

Effective multi-scene localization on benchmark datasets

03

Transformer-based architecture enhances feature focus

Abstract

Absolute camera pose regressors estimate the position and orientation of a camera given the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended to learn multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into pose predictions. This allows our model to focus on general features that are informative for localization, while embedding multiple scenes in parallel. We extend our previous MS-Transformer approach \cite{shavit2021learning} by introducing a mixed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yolish/c2f-ms-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Human Pose and Action Recognition

MethodsFocus