Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation
Chuancheng Shi, Shangze Li, Shiming Guo, Simiao Xie, Wenhua Wu, Jingtong Dou, Chao Wu, Canran Xiao, Cong Wang, Zifeng Cheng, Fei Shen, Tat-Seng Chua

TL;DR
This paper investigates the cultural biases in multilingual text-to-image models, revealing that they often produce culturally neutral or English-biased images, and proposes methods to enhance cross-cultural consistency.
Contribution
It identifies the root cause of cultural bias as insufficient activation of culture-related neurons and introduces alignment strategies to improve cultural fidelity in generated images.
Findings
Models often produce culturally neutral or biased images.
Proposed methods improve cultural consistency without sacrificing image quality.
Localization of culture-sensitive signals to specific neurons enables targeted enhancement.
Abstract
Multilingual text-to-image (T2I) models have advanced rapidly in terms of visual realism and semantic alignment, and are now widely utilized. Yet outputs vary across cultural contexts: because language carries cultural connotations, images synthesized from multilingual prompts should preserve cross-lingual cultural consistency. We conduct a comprehensive analysis showing that current T2I models often produce culturally neutral or English-biased results under multilingual prompts. Analyses of two representative models indicate that the issue stems not from missing cultural knowledge but from insufficient activation of culture-related representations. We propose a probing method that localizes culture-sensitive signals to a small set of neurons in a few fixed layers. Guided by this finding, we introduce two complementary alignment strategies: (1) inference-time cultural activation that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Language and cultural evolution · Multimodal Machine Learning Applications
