AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation
Xinyu Hou, Xiaoming Li, Chen Change Loy

TL;DR
This paper introduces AITTI, a novel method for reducing stereotypical biases in text-to-image generation by learning adaptive inclusive tokens that do not require explicit attribute specification or prior bias knowledge.
Contribution
We propose a lightweight adaptive mapping network to generate inclusive tokens for de-biasing, which generalizes to unseen concepts without explicit attribute labels.
Findings
Outperforms previous bias mitigation methods without attribute specification.
Maintains alignment between generated images and text descriptions.
Achieves comparable results to attribute-specific de-biasing models.
Abstract
Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Unlike existing de-biasing approaches, our method requires neither explicit attribute specification nor prior knowledge of the bias distribution. Specifically, the core of our method is a lightweight adaptive mapping network, which can customize the inclusive tokens for the concepts to be de-biased, making the tokens generalizable to unseen concepts regardless of their original bias distributions. This is achieved by tuning the adaptive mapping network with a handful of balanced and inclusive samples using an anchor loss. Experimental results demonstrate that our method outperforms…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper focuses on addressing the bias problem in current T2I models, particularly gender, race, and age biases, which has important social benefits. 2. The proposed method is simple yet model-agnostic, as stated by the authors, and can be easily adopted in existing approaches to help reduce bias issues. 3. Extensive experiments were conducted to evaluate the effectiveness of the proposed method in mitigating biases.
1. The biases present in current T2I models are not primarily due to the models themselves but stem from other non-technical issues, such as dataset limitations or unclear prompts. Additionally, for evaluation, the authors use a CLIP zero-shot classifier to classify sensitive attributes. However, since CLIP is likely trained on biased datasets, the classifier may more accurately identify attributes that are highly frequent in its training data, which could, in turn, affect the accuracy of evalua
The paper introduces a novel approach by using adaptive inclusive tokens to mitigate bias in text-to-image generation models without the need for explicit attribute specification or prior knowledge of bias distribution, sounds interesting. Experimental results demonstrate that the proposed method outperforms previous bias mitigation techniques in scenarios without attribute specification. The method seems generalizes well and this paper is easy to read.
Potential Introduction of Factual Errors: The proposed method may introduce factual inaccuracies when dealing with non-neutral concepts. For example, generating an image of a female U.S. president, which does not align with historical facts. The authors do not discuss or evaluate the conflict between accuracy and fairness in their approach. Lack of Motivation and Theoretical Foundation: The adaptive mapping network proposed by the authors lacks a clear motivation and theoretical discussion. Thi
1. The paper introduces a novel method for reducing biases in text-to-image generation, which is critical to AI ethics and fairness. The concept of learning adaptive inclusive tokens that can shift attribute distributions in generative outputs is a creative solution. 2. The method's ability to generalize to unseen concepts and handle multiple attributes is a major advantage, showcasing its potential for broad application across various domains.
1. While the authors claim to have developed a lightweight adaptive mapping network to address bias mitigation, the training time of nearly one hour suggests that it may not be as lightweight as initially proposed. This process raises concerns about the practicality and efficiency of the solution, especially in contexts where rapid deployment or real-time adjustments are required. 2. While the authors demonstrate quantitative improvements over the baseline in comparative analysis, the qualitativ
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAugmented Reality Applications · Video Analysis and Summarization · Handwritten Text Recognition Techniques
