Paying Attention to Activation Maps in Camera Pose Regression
Yoli Shavit, Ron Ferens, Yosi Keller

TL;DR
This paper introduces an attention-based transformer approach for camera pose regression that leverages activation maps to improve accuracy, achieving state-of-the-art results including sub-meter precision in outdoor scenes.
Contribution
It proposes a novel transformer-based method using activation maps for pose regression, enabling focused attention on spatial features for position and orientation estimation.
Findings
Outperforms existing pose regressors on multiple benchmarks.
Achieves sub-meter accuracy in outdoor scenes.
Demonstrates the effectiveness of attention mechanisms in pose estimation.
Abstract
Camera pose regression methods apply a single forward pass to the query image to estimate the camera pose. As such, they offer a fast and light-weight alternative to traditional localization schemes based on image retrieval. Pose regression approaches simultaneously learn two regression tasks, aiming to jointly estimate the camera position and orientation using a single embedding vector computed by a convolutional backbone. We propose an attention-based approach for pose regression, where the convolutional activation maps are used as sequential inputs. Transformers are applied to encode the sequential activation maps as latent vectors, used for camera pose regression. This allows us to pay attention to spatially-varying deep features. Using two Transformer heads, we separately focus on the features for camera position and orientation, based on how informative they are per task. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · 3D Surveying and Cultural Heritage
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Residual Connection · Layer Normalization · Adam · Dense Connections · Softmax · Dropout
