TL;DR
BiasIG is a comprehensive benchmark designed to measure multi-dimensional social biases in text-to-image models, enabling detailed diagnosis and evaluation of bias mitigation methods.
Contribution
The paper introduces BiasIG, a novel benchmark with an automated evaluation pipeline that disentangles biases across four dimensions in T2I models.
Findings
BiasIG effectively diagnoses biases in 8 T2I models.
Debiasing methods often cause unintended effects on unrelated demographics.
Bias mitigation tends to reduce discrimination but can also introduce new biases.
Abstract
Text-to-Image (T2I) generative models have revolutionized content creation, yet they inherently risk amplifying societal biases. While sociological research provides systematic classifications of bias, existing T2I benchmarks largely conflate these nuances or focus narrowly on occupational stereotypes, leaving the multi-dimensional nature of generative bias inadequately measured. In this paper, we introduce BiasIG, a unified benchmark that quantifies social biases across a curated dataset of 47,040 prompts. Grounded in sociological and machine ethics frameworks, BiasIG disentangles biases across 4 dimensions to enable fine-grained diagnosis. To facilitate scalable and reliable evaluation, we propose a fully automated pipeline powered by a fine-tuned multi-modal large language model, achieving high alignment accuracy comparable to human experts. Extensive experiments on 8 T2I models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
