Fair Text-to-Image Diffusion via Fair Mapping
Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, Di Wang

TL;DR
This paper introduces Fair Mapping, a lightweight, model-agnostic method to improve demographic fairness in text-to-image diffusion models by controlling prompts and debiasing embeddings, with minimal impact on image quality.
Contribution
We propose a novel, efficient linear network approach that debiases conditioning embeddings, enhancing fairness in human-related image generation without retraining entire models.
Findings
Significantly improves demographic fairness in face image generation
Maintains comparable image quality to standard diffusion models
Requires only minimal additional parameters and computation
Abstract
In this paper, we address the limitations of existing text-to-image diffusion models in generating demographically fair results when given human-related descriptions. These models often struggle to disentangle the target language context from sociocultural biases, resulting in biased image generation. To overcome this challenge, we propose Fair Mapping, a flexible, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image diffusion model by controlling the prompt to achieve fair image generation. One key advantage of our approach is its high efficiency. It only requires updating an additional linear network with few parameters at a low computational cost. By developing a linear network that maps conditioning embeddings into a debiased space, we enable the generation of relatively balanced demographic results based on the specified text condition. With…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The paper is well-written and easy to follow. 2. The method is intuitive and reasonable. 3. The experimental results seem promising.
1. This paper only considers bias within text embeddings and does not extend to biases that may be inherent in the diffusion model itself. This limitation is significant as it suggests that the system could be susceptible to manipulation if the text embedding model is altered. A more holistic approach that also scrutinizes and corrects for biases within the diffusion model could potentially offer a more robust and less vulnerable solution. 2. The experiments are limited to biases related to gend
- The paper tackles a timely and practically-relevant problem supported by a fair amount of experiments. Building fair diffusion models is an area with limited prior research, making this work particularly valuable. - The proposed method is simple yet effective, and pluggable without modifying the pre-trained model. - Overall, the paper is clearly written and easy to follow.
- Although the paper covers a good amount of relevant previous studies, the paper lacks baseline experiments. For example, despite [1] focus on fair-guidance while this work focus on pluggable mapping module, the authors can calculate FairScore and compare w.r.t. training time, overhead memory, etc. - While the unfairness is largely resolved through the proposed mapping module, such a result may not come at a surprise since FairScore and the employed fairness loss term are quite similar. - Th
- The proposed method is simple and easy to understand. - The mapping network is trained on top of a frozen text encoder, making it widely applicable to other text-conditioned models. - The experiments demonstrate that Fair Diffusion reduces language biases and improves generation of more diverse people while maintaining the semantics outlined in the input prompt. - Ablation studies are provided to highlight the significance of both loss terms in the training objective.
- In Table 1, the delta in improvement of Fair Mapping over the baselines is relatively small for race. It is difficult to understand how much a 0.01 improvement actually looks like in terms of qualitative performance. - The authors mention that the value of the loss weight hyperparameter can affect the visual quality. Including image quality metrics like FID would be helpful to quantify how much degradation is introduced because of the debiasing network. - Some details that are necessary for cl
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Vietnamese History and Culture Studies · Computational and Text Analysis Methods
MethodsDiffusion
