Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

Ilona van der Linden; Sahana Kumar; Arnav Dixit; Aadi Sudan; Smruthi Danda; David C. Anastasiu; Kai Lukoff

arXiv:2510.21011·cs.HC·March 30, 2026

Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff

PDF

TL;DR

This study audits four large language models for racial and gender biases in generated occupational personas, revealing systematic distortions and overgeneralizations compared to real-world data.

Contribution

It provides a comprehensive cross-model analysis of demographic biases in AI-generated occupational personas, highlighting shared structural biases.

Findings

01

Models underrepresent Black and White workers in occupations.

02

Overrepresentation of Hispanic and Asian workers amplifies stereotypes.

03

Biases are consistent across models with different origins.

Abstract

As generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models - GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium - across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31pp) and Black (-9pp) workers are consistently underrepresented, while Hispanic (+17pp) and Asian (+12pp) workers are overrepresented, with stereotype exaggeration amplifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.