Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views

Xiaonan Wang; Bo Shao; Hansaem Kim

arXiv:2506.03371·cs.CV·June 5, 2025

Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views

Xiaonan Wang, Bo Shao, Hansaem Kim

PDF

Open Access

TL;DR

This paper introduces KoreaGEO Bench, a detailed multimodal benchmark for Korean street view geolocation, revealing modality influences and biases in current vision-language models with implications for privacy and accuracy.

Contribution

It provides the first fine-grained, multimodal geolocation benchmark for Korean street views, including a new evaluation protocol and analysis of model biases.

Findings

01

Modality affects localization accuracy.

02

Models exhibit structural biases toward core cities.

03

Benchmark enables nuanced evaluation of VLMs.

Abstract

Recent advances in vision-language models (VLMs) have enabled accurate image-based geolocation, raising serious concerns about location privacy risks in everyday social media posts. However, current benchmarks remain coarse-grained, linguistically biased, and lack multimodal and privacy-aware evaluations. To address these gaps, we present KoreaGEO Bench, the first fine-grained, multimodal geolocation benchmark for Korean street views. Our dataset comprises 1,080 high-resolution images sampled across four urban clusters and nine place types, enriched with multi-contextual annotations and two styles of Korean captions simulating real-world privacy exposure. We introduce a three-path evaluation protocol to assess ten mainstream VLMs under varying input modalities and analyze their accuracy, spatial bias, and reasoning behavior. Results reveal modality-driven shifts in localization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Mobility and Location-Based Analysis · Advanced Neural Network Applications