Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models

Felix Herron; Solange Rossato; Alexandre Allauzen; Fran\c{c}ois Portet

arXiv:2604.22631·cs.CL·April 27, 2026

Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models

Felix Herron, Solange Rossato, Alexandre Allauzen, Fran\c{c}ois Portet

PDF

TL;DR

This paper investigates the sources of demographic unfairness in speech recognition models by analyzing phoneme embeddings, identifying types of errors, and evaluating fairness-enhancing training methods.

Contribution

It introduces a framework for typifying errors in phoneme embeddings, revealing the presence of bias and variance issues contributing to unfairness in ASR systems.

Findings

01

Phoneme classification probes on disadvantaged speaker groups can improve performance, indicating group-level bias.

02

Higher phoneme variance correlates with worse phoneme prediction accuracy.

03

Fairness-enhancing finetuning does not reduce random embedding errors or improve phoneme classification benefits.

Abstract

Modern automatic speech recognition (ASR) systems have been observed to function better for certain speaker groups (SGs) than others, despite recent gains in overall performance. One potential impediment to progress towards fairer ASR is a more nuanced understanding of the types of modeling errors that speech encoder models make, and in particular the difference between the structure of embeddings for high-performance and low-performance SGs. This paper proposes a framework typifying two types of error that can occur in modeling phonemes in ASR systems: random error/high variance in phoneme embedding, vs systematic error/embedding bias. We find that training phoneme classification probes only on a single, typically disadvantaged SG, sometimes improves performance for that SG, which is evidence for the existence of SG-level bias in phoneme embeddings. On the other hand, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.