Fairness Dynamics During Training

Krishna Patel; Nivedha Sivakumar; Barry-John Theobald; Luca Zappella; Nicholas Apostoloff

arXiv:2506.01709·cs.CL·June 3, 2025

Fairness Dynamics During Training

Krishna Patel, Nivedha Sivakumar, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

PDF

Open Access

TL;DR

This paper studies how biases in large language models evolve during training, introduces new metrics for fairness, and demonstrates how early stopping can improve fairness without significantly harming performance.

Contribution

It introduces two novel metrics to monitor fairness dynamics during training and shows how early stopping can mitigate biases in large language models.

Findings

01

Biases can emerge suddenly during training.

02

Early stopping can significantly improve fairness.

03

Larger models tend to exhibit more biases.

Abstract

We investigate fairness dynamics during Large Language Model (LLM) training to enable the diagnoses of biases and mitigations through training interventions like early stopping; we find that biases can emerge suddenly and do not always follow common performance metrics. We introduce two new metrics to evaluate fairness dynamics holistically during model pre-training: Average Rank and Jensen-Shannon Divergence by Parts. These metrics provide insights into the Pythia models' progression of biases in gender prediction of occupations on the WinoBias dataset. By monitoring these dynamics, we find that (1) Pythia-6.9b is biased towards men; it becomes more performant and confident predicting "male" than "female" during training, (2) via early-stopping, Pythia-6.9b can exchange 1.7% accuracy on LAMBADA for a 92.5% increase in fairness, and (3) larger models can exhibit more bias; Pythia-6.9b…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Resource Development and Performance Evaluation · Complex Systems and Decision Making