Scaling Cultural Resources for Improving Generative Models

Hayk Stepanyan; Aishwarya Verma; Andrew Zaldivar; Rutledge Chin Feman; Erin MacMurray van Liemt; Charu Kalia; Vinodkumar Prabhakaran; Sunipa Dev

arXiv:2510.25167·cs.CY·October 30, 2025

Scaling Cultural Resources for Improving Generative Models

Hayk Stepanyan, Aishwarya Verma, Andrew Zaldivar, Rutledge Chin Feman, Erin MacMurray van Liemt, Charu Kalia, Vinodkumar Prabhakaran, Sunipa Dev

PDF

TL;DR

This paper presents a scalable pipeline for collecting multilingual, culturally relevant data to enhance and evaluate the global cultural competence of generative AI models.

Contribution

It introduces a repeatable, scalable method for expanding cultural resources, addressing cross-cultural gaps in generative models.

Findings

01

Developed a pipeline for multilingual data collection

02

Enables assessment of global applicability of models

03

Supports targeted improvements in cultural competence

Abstract

Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been commonly conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a repeatable, scalable, multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.