DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision   Model and Feature Mixing

Gaoshuang Huang; Yang Zhou; Xiaofei Hu; Chenglong Zhang; Luying Zhao,; Wenjian Gan; Mingbo Hou

arXiv:2311.00230·cs.CV·December 6, 2023·1 cites

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

Gaoshuang Huang, Yang Zhou, Xiaofei Hu, Chenglong Zhang, Luying Zhao,, Wenjian Gan, Mingbo Hou

PDF

Open Access 1 Repo

TL;DR

This paper introduces DINO-Mix, a novel visual place recognition architecture that combines foundational vision models with feature mixing, significantly improving accuracy in complex environments with lighting, seasonal, and occlusion challenges.

Contribution

The paper proposes DINO-Mix, a new VPR method that leverages foundational vision models and feature aggregation to enhance robustness and accuracy in challenging conditions.

Findings

01

DINO-Mix achieves top-1 accuracy of 91.75% on Tokyo24/7.

02

It outperforms state-of-the-art methods with an average accuracy gain of 5.14%.

03

Demonstrates robustness across lighting, seasonal, and occlusion variations.

Abstract

Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. In this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. We propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. This architecture relies on the powerful image feature extraction capabilities of foundational vision models. We employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GaoShuang98/DINO-Mix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Indoor and Outdoor Localization Technologies · Robotics and Sensor-Based Localization