TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition
Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan

TL;DR
TAT-VPR introduces a ternary-quantized transformer that dynamically balances accuracy and efficiency in visual place recognition, enabling deployment on resource-constrained platforms without sacrificing localization performance.
Contribution
It proposes a novel ternary-quantized transformer with a learned activation-sparsity gate and a two-stage distillation pipeline for efficient and accurate visual SLAM loop-closure.
Findings
Reduces computation by up to 40% at run-time.
Maintains state-of-the-art localization accuracy.
Operates effectively on micro-UAV and embedded systems.
Abstract
TAT-VPR is a ternary-quantized transformer that brings dynamic accuracy-efficiency trade-offs to visual SLAM loop-closure. By fusing ternary weights with a learned activation-sparsity gate, the model can control computation by up to 40% at run-time without degrading performance (Recall@1). The proposed two-stage distillation pipeline preserves descriptor quality, letting it run on micro-UAV and embedded SLAM stacks while matching state-of-the-art localization accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
