Toward Culturally Grounded Natural Language Processing
Sina Bagheri Nezhad

TL;DR
This paper reviews multilingual NLP challenges, emphasizing cultural competence, and proposes a layered evaluation framework to better model and validate language use within diverse communities.
Contribution
It synthesizes existing research on cultural and linguistic diversity in NLP and introduces a comprehensive evaluation agenda centered on ecological validity and community validation.
Findings
Training data coverage is crucial but insufficient alone.
Culturally grounded NLP requires modeling communicative ecologies.
Proposed layered evaluation emphasizes community validation and ecological validity.
Abstract
Multilingual NLP is often treated as a route to global inclusion, but linguistic coverage and cultural competence frequently diverge. This paper synthesizes over 50 papers spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal benchmarks, benchmark-design critique, and community-grounded data practices. Across this literature, training data coverage remains important, but outcomes are also shaped by tokenization, prompt language, translated benchmark design, culturally grounded supervision, modality, and who authors or validates evaluation data. We argue that culturally grounded NLP should move beyond treating languages as isolated rows in benchmark tables and instead model communicative ecologies: the institutions, scripts, domains, modalities, and communities through which language is used. We propose a layered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
