Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention
Fengyi Xu, Jun Ma, Waishan Qiu, Cui Guo, Jack C.P. Cheng

TL;DR
This paper introduces VPR-AttLLM, a framework that enhances visual place recognition for crowdsourced flood images by integrating LLM-guided attention, significantly improving geo-localization accuracy without retraining models.
Contribution
It presents a novel, model-agnostic approach that leverages LLMs for attention-guided descriptor enhancement, improving robustness and accuracy in flood image geo-localization.
Findings
Improves recall by 1-3% on standard benchmarks.
Achieves up to 8% improvement on real flood imagery.
Enhances cross-source robustness without retraining models.
Abstract
Crowdsourced social media imagery provides real-time visual evidence of urban flooding but often lacks reliable geographic metadata for emergency response. Existing Visual Place Recognition (VPR) models struggle to geo-localize these images due to cross-source domain shifts and visual distortions. We present VPR-AttLLM, a model-agnostic framework integrating the semantic reasoning and geospatial knowledge of Large Language Models (LLMs) into VPR pipelines via attention-guided descriptor enhancement. VPR-AttLLM uses LLMs to isolate location-informative regions and suppress transient noise, improving retrieval without model retraining or new data. We evaluate this framework across San Francisco and Hong Kong using established queries, synthetic flooding scenarios, and real social media flood images. Integrating VPR-AttLLM with state-of-the-art models (CosPlace, EigenPlaces, SALAD)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
