Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views
Xiaonan Wang, Bo Shao, Hansaem Kim

TL;DR
This paper introduces KoreaGEO Bench, a detailed multimodal benchmark for Korean street view geolocation, revealing modality influences and biases in current vision-language models with implications for privacy and accuracy.
Contribution
It provides the first fine-grained, multimodal geolocation benchmark for Korean street views, including a new evaluation protocol and analysis of model biases.
Findings
Modality affects localization accuracy.
Models exhibit structural biases toward core cities.
Benchmark enables nuanced evaluation of VLMs.
Abstract
Recent advances in vision-language models (VLMs) have enabled accurate image-based geolocation, raising serious concerns about location privacy risks in everyday social media posts. However, current benchmarks remain coarse-grained, linguistically biased, and lack multimodal and privacy-aware evaluations. To address these gaps, we present KoreaGEO Bench, the first fine-grained, multimodal geolocation benchmark for Korean street views. Our dataset comprises 1,080 high-resolution images sampled across four urban clusters and nine place types, enriched with multi-contextual annotations and two styles of Korean captions simulating real-world privacy exposure. We introduce a three-path evaluation protocol to assess ten mainstream VLMs under varying input modalities and analyze their accuracy, spatial bias, and reasoning behavior. Results reveal modality-driven shifts in localization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Mobility and Location-Based Analysis · Advanced Neural Network Applications
