PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
Shufan Li, Harkanwar Singh, Aditya Grover

TL;DR
PopAlign introduces a population-level preference optimization method to reduce biases in text-to-image models, effectively balancing bias mitigation with image quality preservation.
Contribution
The paper presents PopAlign, a novel population-level alignment approach that addresses biases in T2I models, overcoming limitations of existing pairwise preference methods.
Findings
Significantly reduces gender and ethnicity biases in generated images.
Maintains high image quality comparable to baseline models.
Demonstrates effectiveness through human evaluation and bias metrics.
Abstract
Text-to-image (T2I) models achieve high-fidelity generation through extensive training on large datasets. However, these models may unintentionally pick up undesirable biases of their training data, such as over-representation of particular identities in gender or ethnicity neutral prompts. Existing alignment methods such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) fail to address this problem effectively because they operate on pairwise preferences consisting of individual samples, while the aforementioned biases can only be measured at a population level. For example, a single sample for the prompt "doctor" could be male or female, but a model generating predominantly male doctors even with repeated sampling reflects a gender bias. To address this limitation, we introduce PopAlign, a novel approach for population-level preference…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The paper is straightforward and easy to follow.
1. **Lengthy Background and Related Work Sections:** These sections are overly detailed, detracting from the main methodology and potentially causing readers to lose focus on the core contributions of the paper. 2. **Limited Novelty:** The proposed method is relatively straightforward, primarily offering a way to create a fairness-focused pairwise dataset for preference learning. The training technique resembles the approach introduced in Diffusion-DPO, raising concerns about the extent of novel
- S1: The work proposes a solution to a critical and open research problem. - S2: Although previous work has shown that there is an inherent trade-off between diversity and quality [1], PopAlign was shown to improve fairness while not greatly harming other quality aspects. - S3: The authors compared a PopAlign with a diverse set of techniques to mitigate biases in T2I models. - S4: Beyond autoeval metrics, the authors also evaluated the proposed approach via human evaluation. [1] Astolfi et al
- W1: Unclear generalization of findings to other text-to-image models. The authors only performed experiments with a single base model, namely SDXL, which makes it impossible to assess to what extent the efficacy of PopAlign would transfer to other models. - W2: Choice of evaluation metrics. CLIP has been extensively shown to not correlate well with human perception of text-to-image alignment [1], making it a poor metric to evaluate prompt-image alignment. Authors could consider, for example,
- The paper was well written and easy to follow. - The method is clear and easy to use and does not degrade the quality of the generations.
- More details are needed on how well the method generalizes to new prompts. Sec 6.3 mentions that 100 prompts are manually written for evaluation, how similar are these prompts to those used for training? Does it generalize to actions, or the more general prompts from LAION Aesthetics dataset, style and personal descriptors used by Shen et al., or to multiple people? Sec 6.5 evaluates on generic prompts but not for fairness. - Scalability of the method. The categorial attributes of gender and r
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Educational Games and Gamification · Topic Modeling
