Vision Language Models are Confused Tourists
Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji

TL;DR
This paper introduces ConfusedTourist, a benchmark to evaluate Vision-Language Models' robustness to cultural perturbations, revealing significant vulnerabilities and attention shifts that impair model stability across diverse cultural inputs.
Contribution
The paper presents a novel adversarial robustness suite for cultural evaluation of VLMs, exposing their weaknesses in handling mixed cultural cues and highlighting the need for improved cultural robustness.
Findings
VLM accuracy drops under simple cultural perturbations
Image-generation-based perturbations worsen model performance
Attention shifts cause models to focus on distracting cues
Abstract
Although the cultural dimension has been one of the key aspects in evaluating Vision-Language Models (VLMs), their ability to remain stable across diverse cultural inputs remains largely untested, despite being crucial to support diversity and multicultural societies. Existing evaluations often rely on benchmarks featuring only a singular cultural concept per image, overlooking scenarios where multiple, potentially unrelated cultural cues coexist. To address this gap, we introduce ConfusedTourist, a novel cultural adversarial robustness suite designed to assess VLMs' stability against perturbed geographical cues. Our experiments reveal a critical vulnerability, where accuracy drops heavily under simple image-stacking perturbations and even worsens with its image-generation-based variant. Interpretability analyses further show that these failures stem from systematic attention shifts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Ethics and Social Impacts of AI
