AI Sees Your Location, But With A Bias Toward The Wealthy World
Jingyuan Huang, Jen-tse Huang, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang, Jieyu Zhao

TL;DR
This paper evaluates visual-language models' ability to recognize geographic locations from images, revealing significant biases favoring wealthy, densely populated regions and raising privacy concerns.
Contribution
Introduces a benchmark dataset and evaluates VLMs, highlighting regional biases and privacy implications in geographic recognition tasks.
Findings
VLMs achieve up to 53.8% accuracy in city prediction.
Models perform better on developed, dense regions (+12.5%) and worse on less developed areas.
Regional biases include over-predicting certain locations like Sydney.
Abstract
Visual-Language Models (VLMs) have shown remarkable performance across various tasks, particularly in recognizing geographic information from images. However, VLMs still show regional biases in this task. To systematically evaluate these issues, we introduce a benchmark consisting of 1,200 images paired with detailed geographic metadata. Evaluating four VLMs, we find that while these models demonstrate the ability to recognize geographic information from images, achieving up to 53.8% accuracy in city prediction, they exhibit significant biases. Specifically, performance is substantially higher for economically developed and densely populated regions compared to less developed (-12.5%) and sparsely populated (-17.0%) areas. Moreover, regional biases of frequently over-predicting certain locations remain. For instance, they consistently predict Sydney for images taken in Australia, shown…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
