ProGEO: Generating Prompts through Image-Text Contrastive Learning for Visual Geo-localization
Chen Mao, Jingqi Hu

TL;DR
ProGEO introduces a two-stage contrastive learning approach leveraging multi-modal prompts to improve visual geo-localization accuracy, addressing the challenge of fine-grained image features and limited descriptions.
Contribution
It proposes a novel method using learnable text prompts and contrastive learning to enhance visual feature extraction for geo-localization tasks.
Findings
Achieves competitive results on large-scale geo-localization datasets.
Demonstrates the effectiveness of multi-modal prompts in improving visual feature learning.
Validates the approach's generalizability across multiple datasets.
Abstract
Visual Geo-localization (VG) refers to the process to identify the location described in query images, which is widely applied in robotics field and computer vision tasks, such as autonomous driving, metaverse, augmented reality, and SLAM. In fine-grained images lacking specific text descriptions, directly applying pure visual methods to represent neighborhood features often leads to the model focusing on overly fine-grained features, unable to fully mine the semantic information in the images. Therefore, we propose a two-stage training method to enhance visual performance and use contrastive learning to mine challenging samples. We first leverage the multi-modal description capability of CLIP (Contrastive Language-Image Pretraining) to create a set of learnable text prompts for each geographic image feature to form vague descriptions. Then, by utilizing dynamic text prompts to assist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · Contrastive Learning
