CVQA: Culturally-diverse Multilingual Visual Question Answering   Benchmark

David Romero; Chenyang Lyu; Haryo Akbarianto Wibowo; Teresa Lynn; Injy; Hamed; Aditya Nanda Kishore; Aishik Mandal; Alina Dragonetti; Artem Abzaliev,; Atnafu Lambebo Tonja; Bontu Fufa Balcha; Chenxi Whitehouse; Christian; Salamea; Dan John Velasco; David Ifeoluwa Adelani; David Le Meur; Emilio; Villa-Cueva; Fajri Koto; Fauzan Farooqui; Frederico Belcavello; Ganzorig; Batnasan; Gisela Vallejo; Grainne Caulfield; Guido Ivetta; Haiyue Song; Henok; Biadglign Ademtew; Hern\'an Maina; Holy Lovenia; Israel Abebe Azime; Jan; Christian Blaise Cruz; Jay Gala; Jiahui Geng; Jesus-German Ortiz-Barajas,; Jinheon Baek; Jocelyn Dunstan; Laura Alonso Alemany; Kumaranage Ravindu Yasas; Nagasinghe; Luciana Benotti; Luis Fernando D'Haro; Marcelo Viridiano; Marcos; Estecha-Garitagoitia; Maria Camila Buitrago Cabrera; Mario; Rodr\'iguez-Cantelar; M\'elanie Jouitteau; Mihail Mihaylov; Mohamed Fazli; Mohamed Imam; Muhammad Farid Adilazuarda; Munkhjargal Gochoo; Munkh-Erdene; Otgonbold; Naome Etori; Olivier Niyomugisha; Paula M\'onica Silva; Pranjal; Chitale; Raj Dabre; Rendi Chevi; Ruochen Zhang; Ryandito Diandaru; Samuel; Cahyawijaya; Santiago G\'ongora; Soyeong Jeong; Sukannya Purkayastha; Tatsuki; Kuribayashi; Teresa Clifford; Thanmay Jayakumar; Tiago Timponi Torrent,; Toqeer Ehsan; Vladimir Araujo; Yova Kementchedjhieva; Zara Burzo; Zheng Wei; Lim; Zheng Xin Yong; Oana Ignat; Joan Nwatu; Rada Mihalcea; Thamar Solorio,; Alham Fikri Aji

arXiv:2406.05967·cs.CV·November 5, 2024

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy, Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev,, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian, Salamea, Dan John Velasco, David Ifeoluwa Adelani

PDF

Open Access 2 Datasets 1 Video

TL;DR

CVQA is a new multilingual and culturally-diverse VQA benchmark with images and questions from 30 countries, designed to evaluate and improve the cultural understanding of multimodal AI models.

Contribution

The paper introduces CVQA, a culturally-diverse multilingual VQA dataset with native speaker input, covering 31 languages and 30 countries, and benchmarks current models on this challenging dataset.

Findings

01

Current models struggle with CVQA's cultural and linguistic diversity.

02

CVQA reveals biases and limitations in existing multimodal models.

03

The dataset encourages development of culturally-aware AI models.

Abstract

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Text and Document Classification Technologies · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training