Toward Culturally Grounded Natural Language Processing

Sina Bagheri Nezhad

arXiv:2603.26013·cs.CL·May 5, 2026

Toward Culturally Grounded Natural Language Processing

Sina Bagheri Nezhad

PDF

TL;DR

This paper reviews multilingual NLP challenges, emphasizing cultural competence, and proposes a layered evaluation framework to better model and validate language use within diverse communities.

Contribution

It synthesizes existing research on cultural and linguistic diversity in NLP and introduces a comprehensive evaluation agenda centered on ecological validity and community validation.

Findings

01

Training data coverage is crucial but insufficient alone.

02

Culturally grounded NLP requires modeling communicative ecologies.

03

Proposed layered evaluation emphasizes community validation and ecological validity.

Abstract

Multilingual NLP is often treated as a route to global inclusion, but linguistic coverage and cultural competence frequently diverge. This paper synthesizes over 50 papers spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal benchmarks, benchmark-design critique, and community-grounded data practices. Across this literature, training data coverage remains important, but outcomes are also shaped by tokenization, prompt language, translated benchmark design, culturally grounded supervision, modality, and who authors or validates evaluation data. We argue that culturally grounded NLP should move beyond treating languages as isolated rows in benchmark tables and instead model communicative ecologies: the institutions, scripts, domains, modalities, and communities through which language is used. We propose a layered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.