Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5
Roberto Balestri

TL;DR
This paper investigates gender and skin-tone biases in two commercial image generators, revealing significant default biases and diverging gender preferences despite using neutral prompts, through a large-scale, colorimetric analysis.
Contribution
It introduces a comprehensive framework combining color normalization and skin tone quantification to audit biases in state-of-the-art image generation models.
Findings
Both models show >96% white bias.
Gemini favors female-presenting subjects.
GPT favors male-presenting subjects with lighter skin tones.
Abstract
This study quantifies gender and skin-tone bias in two widely deployed commercial image generators - Gemini Flash 2.5 Image (NanoBanana) and GPT Image 1.5 - to test the assumption that neutral prompts yield demographically neutral outputs. We generated 3,200 photorealistic images using four semantically neutral prompts. The analysis employed a rigorous pipeline combining hybrid color normalization, facial landmark masking, and perceptually uniform skin tone quantification using the Monk (MST), PERLA, and Fitzpatrick scales. Neutral prompts produced highly polarized defaults. Both models exhibited a strong "default white" bias (>96% of outputs). However, they diverged sharply on gender: Gemini favored female-presenting subjects, while GPT favored male-presenting subjects with lighter skin tones. This research provides a large-scale, comparative audit of state-of-the-art models using an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Psychology and Human Behavior · Face Recognition and Perception · Skin Protection and Aging
