Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations
Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff

TL;DR
This study audits four large language models for racial and gender biases in generated occupational personas, revealing systematic distortions and overgeneralizations compared to real-world data.
Contribution
It provides a comprehensive cross-model analysis of demographic biases in AI-generated occupational personas, highlighting shared structural biases.
Findings
Models underrepresent Black and White workers in occupations.
Overrepresentation of Hispanic and Asian workers amplifies stereotypes.
Biases are consistent across models with different origins.
Abstract
As generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models - GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium - across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31pp) and Black (-9pp) workers are consistently underrepresented, while Hispanic (+17pp) and Asian (+12pp) workers are overrepresented, with stereotype exaggeration amplifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
