Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries

Tushar Pranav; Eshan Pandey; Austria Lyka Diane Bala; Aman Chadha; Indriyati Atmosukarto; Donny Soh Cheng Lock

arXiv:2512.01419·cs.CV·December 2, 2025

Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries

Tushar Pranav, Eshan Pandey, Austria Lyka Diane Bala, Aman Chadha, Indriyati Atmosukarto, Donny Soh Cheng Lock

PDF

Open Access

TL;DR

This paper introduces RICE-VL, a comprehensive benchmark for evaluating vision-language models' cultural understanding across ASEAN countries, revealing significant gaps and biases in current models.

Contribution

The paper presents RICE-VL, a new benchmark with diverse culturally annotated VQA and grounding tasks, and proposes SEA-LAVE to assess cultural alignment in VLMs.

Findings

01

VLMs show performance gaps in low-resource countries.

02

Models struggle with culturally significant visual grounding.

03

Current VLMs exhibit Western-centric biases.

Abstract

Vision-Language Models (VLMs) excel in multimodal tasks but often exhibit Western-centric biases, limiting their effectiveness in culturally diverse regions like Southeast Asia (SEA). To address this, we introduce RICE-VL, a novel benchmark evaluating VLM cultural understanding across 11 ASEAN countries. RICE-VL includes over 28,000 human-curated Visual Question Answering (VQA) samples -- covering True or False, Fill-in-the-Blank, and open-ended formats -- and 1,000 image-bounding box pairs for Visual Grounding, annotated by culturally informed experts across 14 sub-ground categories. We propose SEA-LAVE, an extension of the LAVE metric, assessing textual accuracy, cultural alignment, and country identification. Evaluations of six open- and closed-source VLMs reveal significant performance gaps in low-resource countries and abstract cultural domains. The Visual Grounding task tests…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Digital Storytelling and Education