Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Chao Wu; Zhenyi Wang; Kangxian Xie; Naresh Kumar Devulapally; Vishnu Suresh Lokhande; Mingchen Gao

arXiv:2507.20973·cs.LG·November 24, 2025

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder

Chao Wu, Zhenyi Wang, Kangxian Xie, Naresh Kumar Devulapally, Vishnu Suresh Lokhande, Mingchen Gao

PDF

TL;DR

SAE Debias is a lightweight, model-agnostic framework that uses sparse autoencoders to identify and suppress gender bias in text-to-image models, improving fairness without retraining or architectural changes.

Contribution

This work introduces SAE Debias, the first application of sparse autoencoders for debiasing in T2I models, providing interpretable and reusable bias control across multiple models.

Findings

01

Significantly reduces gender bias in T2I outputs.

02

Maintains high image generation quality.

03

Operates without retraining or model-specific adjustments.

Abstract

Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects. This paper presents SAE Debias, a lightweight and model-agnostic framework for mitigating such bias in T2I generation. Unlike prior approaches that rely on CLIP-based filtering or prompt engineering, which often require model-specific adjustments and offer limited control, SAE Debias operates directly within the feature space without retraining or architectural modifications. By leveraging a k-sparse autoencoder pre-trained on a gender bias dataset, the method identifies gender-relevant directions within the sparse latent space, capturing professional stereotypes. Specifically, a biased direction per profession is constructed from sparse latents and suppressed during inference to steer generations toward more gender-balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.