DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing
Gaoshuang Huang, Yang Zhou, Xiaofei Hu, Chenglong Zhang, Luying Zhao,, Wenjian Gan, Mingbo Hou

TL;DR
This paper introduces DINO-Mix, a novel visual place recognition architecture that combines foundational vision models with feature mixing, significantly improving accuracy in complex environments with lighting, seasonal, and occlusion challenges.
Contribution
The paper proposes DINO-Mix, a new VPR method that leverages foundational vision models and feature aggregation to enhance robustness and accuracy in challenging conditions.
Findings
DINO-Mix achieves top-1 accuracy of 91.75% on Tokyo24/7.
It outperforms state-of-the-art methods with an average accuracy gain of 5.14%.
Demonstrates robustness across lighting, seasonal, and occlusion variations.
Abstract
Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. In this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. We propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. This architecture relies on the powerful image feature extraction capabilities of foundational vision models. We employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Indoor and Outdoor Localization Technologies · Robotics and Sensor-Based Localization
