Activating Self-Attention for Multi-Scene Absolute Pose Regression

Miso Lee; Jihwan Kim; and Jae-Pil Heo

arXiv:2411.01443·cs.CV·November 19, 2024

Activating Self-Attention for Multi-Scene Absolute Pose Regression

Miso Lee, Jihwan Kim, and Jae-Pil Heo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper identifies the issue of collapsed self-attention in transformer-based multi-scene pose regression models and proposes solutions to activate self-attention, leading to improved camera pose estimation accuracy.

Contribution

The work reveals the query-key embedding space distortion problem and introduces an auxiliary loss and fixed positional encoding to enhance self-attention activation in pose regression.

Findings

01

Outperforms existing methods in outdoor scenes

02

Outperforms existing methods in indoor scenes

03

Effectively activates self-attention in transformer models

Abstract

Multi-scene absolute pose regression addresses the demand for fast and memory-efficient camera pose estimation across various real-world environments. Nowadays, transformer-based model has been devised to regress the camera pose directly in multi-scenes. Despite its potential, transformer encoders are underutilized due to the collapsed self-attention map, having low representation capacity. This work highlights the problem and investigates it from a new perspective: distortion of query-key embedding space. Based on the statistical analysis, we reveal that queries and keys are mapped in completely different spaces while only a few keys are blended into the query region. This leads to the collapse of the self-attention map as all queries are considered similar to those few keys. Therefore, we propose simple but effective solutions to activate self-attention. Concretely, we present an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dlalth557/ActMST
pytorchOfficial

Videos

Activating Self-Attention for Multi-Scene Absolute Pose Regression· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Robot Manipulation and Learning · Human Pose and Action Recognition