MINERVA-Cultural: A Benchmark for Cultural and Multilingual Long Video Reasoning

Darshan Singh; Arsha Nagrani; Kawshik Manikantan; Harman Singh; Dinesh Tewari; Tobias Weyand; Cordelia Schmid; Anelia Angelova; Shachi Dave

arXiv:2601.10649·cs.CV·April 8, 2026

MINERVA-Cultural: A Benchmark for Cultural and Multilingual Long Video Reasoning

Darshan Singh, Arsha Nagrani, Kawshik Manikantan, Harman Singh, Dinesh Tewari, Tobias Weyand, Cordelia Schmid, Anelia Angelova, Shachi Dave

PDF

1 Repo

TL;DR

MINERVA-Cultural is a new multicultural and multilingual video reasoning benchmark with complex native-language questions, aiming to evaluate and improve long video understanding across diverse cultural contexts.

Contribution

It introduces a culturally diverse, native-language dataset with multi-step reasoning, and proposes a graph-based error analysis method for video reasoning models.

Findings

01

State-of-the-art models perform significantly below human accuracy.

02

Errors mainly arise from visual perception of cultural elements.

03

The benchmark highlights the need for culturally aware video understanding models.

Abstract

Recent advancements in video models have shown tremendous progress, particularly in long video understanding. However, current benchmarks predominantly feature western-centric data and English as the dominant language, introducing significant biases in evaluation. To address this, we introduce MINERVA-Cultural, a challenging benchmark for multicultural and multilingual video reasoning. MINERVA-Cultural comprises high-quality, entirely human-generated annotations from diverse, region-specific cultural videos across 18 global locales. Unlike prior work that relies on automatic translations, MINERVA-Cultural provides complex questions, answers, and multi-step reasoning steps, all crafted in native languages. Making progress on MINERVA-Cultural requires a deeply situated understanding of visual cultural context. Furthermore, we leverage MINERVA-Cultural's reasoning traces to construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-deepmind/neptune?tab=readme-ov-file#minerva-cultural
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.