Learning Multi-Scene Absolute Pose Regression with Transformers

Yoli Shavit; Ron Ferens; Yosi Keller

arXiv:2103.11468·cs.CV·July 27, 2021

Learning Multi-Scene Absolute Pose Regression with Transformers

Yoli Shavit, Ron Ferens, Yosi Keller

PDF

Open Access 2 Repos

TL;DR

This paper introduces a Transformer-based approach for multi-scene absolute camera pose regression, enabling the model to effectively localize across multiple environments using self-attention mechanisms.

Contribution

The work presents a novel Transformer architecture for multi-scene pose regression, improving over previous methods by better capturing scene-invariant features.

Findings

01

Outperforms existing multi-scene pose regressors

02

Surpasses state-of-the-art single-scene methods

03

Effective on indoor and outdoor datasets

Abstract

Absolute camera pose regressors estimate the position and orientation of a camera from the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron head is trained with images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended for learning multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into candidate pose predictions. This mechanism allows our model to focus on general features that are informative for localization while embedding multiple scenes in parallel. We evaluate our method on commonly benchmarked indoor and outdoor datasets and show that it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Human Pose and Action Recognition