Sports and Women's Sports: Gender Bias in Text Generation with Olympic   Data

Laura Biester

arXiv:2502.04218·cs.CL·February 7, 2025

Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data

Laura Biester

PDF

Open Access 1 Video

TL;DR

This paper investigates gender bias in large language models using Olympic data, revealing consistent bias against women when gender is ambiguous in prompts, especially in athletics contexts.

Contribution

It introduces three metrics to measure gender bias and demonstrates pervasive bias in LLMs using real Olympic event data.

Findings

01

Models are biased against women in ambiguous gender prompts.

02

Bias manifests as retrieving only men's results without acknowledgment.

03

Gender bias is pervasive in LLMs in athletic contexts.

Abstract

Large Language Models (LLMs) have been shown to be biased in prior work, as they generate text that is in line with stereotypical views of the world or that is not representative of the viewpoints and values of historically marginalized demographic groups. In this work, we propose using data from parallel men's and women's events at the Olympic Games to investigate different forms of gender bias in language models. We define three metrics to measure bias, and find that models are consistently biased against women when the gender is ambiguous in the prompt. In this case, the model frequently retrieves only the results of the men's event with or without acknowledging them as such, revealing pervasive gender bias in LLMs in the context of athletics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data· underline

Taxonomy

TopicsSports Analytics and Performance · Natural Language Processing Techniques