MineTheGap: Automatic Mining of Biases in Text-to-Image Models

Noa Cohen; Nurit Spingarn-Eliezer; Inbar Huberman-Spiegelglas; Tomer Michaeli

arXiv:2512.13427·cs.CV·December 16, 2025

MineTheGap: Automatic Mining of Biases in Text-to-Image Models

Noa Cohen, Nurit Spingarn-Eliezer, Inbar Huberman-Spiegelglas, Tomer Michaeli

PDF

Open Access 3 Reviews

TL;DR

This paper introduces MineTheGap, an automated method using genetic algorithms to identify prompts that reveal biases in Text-to-Image models, aiming to expose societal and diversity-related biases.

Contribution

MineTheGap is the first approach to automatically mine bias-inducing prompts in TTI models using a genetic algorithm and a novel bias scoring method.

Findings

01

Successfully identifies biased prompts in TTI models

02

Validates bias severity scoring on known bias datasets

03

Enhances understanding of biases in generative image models

Abstract

Text-to-Image (TTI) models generate images based on text prompts, which often leave certain aspects of the desired image ambiguous. When faced with these ambiguities, TTI models have been shown to exhibit biases in their interpretations. These biases can have societal impacts, e.g., when showing only a certain race for a stated occupation. They can also affect user experience when creating redundancy within a set of generated images instead of spanning diverse possibilities. Here, we introduce MineTheGap - a method for automatically mining prompts that cause a TTI model to generate biased outputs. Our method goes beyond merely detecting bias for a given prompt. Rather, it leverages a genetic algorithm to iteratively refine a pool of prompts, seeking for those that expose biases. This optimization process is driven by a novel bias score, which ranks biases according to their severity, as…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

- The paper presents an interesting idea that moves beyond traditional bias detection toward bias discovery, introducing an automated framework for mining prompts that reveal hidden or emergent biases in text-to-image models. - Demonstrates the framework across multiple TTI models (Stable Diffusion 1.4–3 and FLUX), validating the metric through correlation with real-world statistics and showing model-specific bias discovery.

Weaknesses

- The method relies heavily on LLMs to generate prompts and mutations, making its performance dependent on the quality and biases of the chosen LLM. Although the authors acknowledge this limitation, they do not analyze how different LLMs might affect the results or the stability of the mining process. - The proposed metric depends on CLIP embeddings, which can carry demographic biases of their own. Because it measures similarity in this embedding space, it may wrongly treat harmless stylistic or

Reviewer 02Rating 4Confidence 4

Strengths

* The automatic prompt mining technique is novel in TTI model bias evaluation. * The bias measurement method by ranking prompts based on bias score is also novel and seems to be effective from the visual examples.

Weaknesses

My concerns about this paper is mainly regarding its problem formulation, experiment design and empirical results. While the core methodology is novel, the overall experiment design and result analysis lacks systematic rigor. The current presentation makes it difficult to fully assess the efficacy and generalizability of the proposed approach across diverse scenarios and TTI architectures. * **Ambiguity in Bias Definition and Scope**: The paper does not provide a clear, formal definition of "bi

Reviewer 03Rating 6Confidence 5

Strengths

1. The approach goes beyond fixed biased and tackle an important and challenging problem automatic mining of prompts that induce biases. 2. The experimental results are robust, spanning SD 1.4/2.1/3 and FLUX models, which suggests the approach is architecture-agnostic.

Weaknesses

1. The bias score considered cannot be considered a parity metric. it doesn’t check how often each group appears, only whether every text considered has at least one matching image (coverage) and every image matches some text (relevance). I think there is a loophole here. For example, with the texts “female doctor,” “female doctor in ER,” “male doctor,” and “male doctor in office,” a batch containing three male office portraits and one female ER photo will still “pass” both coverage and relevan

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Sentiment Analysis and Opinion Mining