TL;DR
T2I-BiasBench is a comprehensive evaluation framework with thirteen metrics for auditing demographic and cultural biases in text-to-image models, revealing bias amplification and cultural collapse issues.
Contribution
It introduces the first unified multi-metric framework addressing demographic bias, element omission, and cultural collapse in T2I models, with new measures and extensive evaluation.
Findings
Stable Diffusion v1.5 and BK-SDM show bias amplification in beauty prompts.
Contextual constraints reduce gender bias in professional roles.
All models exhibit cultural collapse, with limited cultural diversity.
Abstract
Text-to-image (T2I) generative models achieve impressive visual fidelity but inherit and amplify demographic imbalances and cultural biases embedded in training data. We introduce T2I-BiasBench, a unified evaluation framework of thirteen complementary metrics that jointly captures demographic bias, element omission, and cultural collapse in diffusion models - the first framework to address all three dimensions simultaneously. We evaluate three open-source models - Stable Diffusion v1.5, BK-SDM Base, and Koala Lightning - against Gemini 2.5 Flash (RLHF-aligned) as a reference baseline. The benchmark comprises 1,574 generated images across five structured prompt categories. T2I-BiasBench integrates six established metrics with seven additional measures: four newly proposed (Composite Bias Score, Grounded Missing Rate, Implicit Element Missing Rate, Cultural Accuracy Ratio) and three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
