On the Cultural Gap in Text-to-Image Generation

Bingshuai Liu; Longyue Wang; Chenyang Lyu; Yong Zhang; Jinsong Su,; Shuming Shi; Zhaopeng Tu

arXiv:2307.02971·cs.CV·July 7, 2023·2 cites

On the Cultural Gap in Text-to-Image Generation

Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su,, Shuming Shi, Zhaopeng Tu

PDF

Open Access

TL;DR

This paper introduces a new benchmark and evaluation method for assessing and improving the cultural diversity of text-to-image models, addressing the challenge of cultural gaps in generated images.

Contribution

The paper proposes the C3 benchmark and a multi-modal metric for better evaluation and fine-tuning of T2I models to enhance cross-cultural image generation.

Findings

01

Stable Diffusion often fails to generate certain cultural objects.

02

The multi-modal metric outperforms existing metrics in data selection.

03

Fine-tuning with culturally aligned data improves cross-cultural generation.

Abstract

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Video Analysis and Summarization

MethodsDiffusion