Image-Based Geolocation Using Large Vision-Language Models

Yi Liu; Junchen Ding; Gelei Deng; Yuekang Li; Tianwei Zhang; Weisong; Sun; Yaowen Zheng; Jingquan Ge; Yang Liu

arXiv:2408.09474·cs.CR·August 20, 2024

Image-Based Geolocation Using Large Vision-Language Models

Yi Liu, Junchen Ding, Gelei Deng, Yuekang Li, Tianwei Zhang, Weisong, Sun, Yaowen Zheng, Jingquan Ge, Yang Liu

PDF

Open Access

TL;DR

This paper introduces ool{}, a novel framework leveraging large vision-language models and a chain-of-thought approach to significantly improve image-based geolocation accuracy, outperforming traditional methods and human benchmarks while addressing privacy concerns.

Contribution

The paper presents ool{}, a new framework that enhances geolocation accuracy using LVLMs and a systematic reasoning approach, addressing privacy risks and dataset issues.

Findings

01

ool{} achieves an average score of 4550.5 in GeoGuessr.

02

ool{} attains an 85.37% win rate in geolocation tasks.

03

Closest predictions are accurate within 0.3 km.

Abstract

Geolocation is now a vital aspect of modern life, offering numerous benefits but also presenting serious privacy concerns. The advent of large vision-language models (LVLMs) with advanced image-processing capabilities introduces new risks, as these models can inadvertently reveal sensitive geolocation information. This paper presents the first in-depth study analyzing the challenges posed by traditional deep learning and LVLM-based geolocation methods. Our findings reveal that LVLMs can accurately determine geolocations from images, even without explicit geographic training. To address these challenges, we introduce \tool{}, an innovative framework that significantly enhances image-based geolocation accuracy. \tool{} employs a systematic chain-of-thought (CoT) approach, mimicking human geoguessing strategies by carefully analyzing visual and contextual cues such as vehicle types,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications