On the Cultural Gap in Text-to-Image Generation
Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su,, Shuming Shi, Zhaopeng Tu

TL;DR
This paper introduces a new benchmark and evaluation method for assessing and improving the cultural diversity of text-to-image models, addressing the challenge of cultural gaps in generated images.
Contribution
The paper proposes the C3 benchmark and a multi-modal metric for better evaluation and fine-tuning of T2I models to enhance cross-cultural image generation.
Findings
Stable Diffusion often fails to generate certain cultural objects.
The multi-modal metric outperforms existing metrics in data selection.
Fine-tuning with culturally aligned data improves cross-cultural generation.
Abstract
One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Video Analysis and Summarization
MethodsDiffusion
