TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

Oliver Grainge; Michael Milford; Indu Bodala; Sarvapali D. Ramchurn; Shoaib Ehsan

arXiv:2505.16447·cs.CV·May 23, 2025

TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan

PDF

Open Access

TL;DR

TAT-VPR introduces a ternary-quantized transformer that dynamically balances accuracy and efficiency in visual place recognition, enabling deployment on resource-constrained platforms without sacrificing localization performance.

Contribution

It proposes a novel ternary-quantized transformer with a learned activation-sparsity gate and a two-stage distillation pipeline for efficient and accurate visual SLAM loop-closure.

Findings

01

Reduces computation by up to 40% at run-time.

02

Maintains state-of-the-art localization accuracy.

03

Operates effectively on micro-UAV and embedded systems.

Abstract

TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications