Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models
Felix Herron, Solange Rossato, Alexandre Allauzen, Fran\c{c}ois Portet

TL;DR
This paper investigates the sources of demographic unfairness in speech recognition models by analyzing phoneme embeddings, identifying types of errors, and evaluating fairness-enhancing training methods.
Contribution
It introduces a framework for typifying errors in phoneme embeddings, revealing the presence of bias and variance issues contributing to unfairness in ASR systems.
Findings
Phoneme classification probes on disadvantaged speaker groups can improve performance, indicating group-level bias.
Higher phoneme variance correlates with worse phoneme prediction accuracy.
Fairness-enhancing finetuning does not reduce random embedding errors or improve phoneme classification benefits.
Abstract
Modern automatic speech recognition (ASR) systems have been observed to function better for certain speaker groups (SGs) than others, despite recent gains in overall performance. One potential impediment to progress towards fairer ASR is a more nuanced understanding of the types of modeling errors that speech encoder models make, and in particular the difference between the structure of embeddings for high-performance and low-performance SGs. This paper proposes a framework typifying two types of error that can occur in modeling phonemes in ASR systems: random error/high variance in phoneme embedding, vs systematic error/embedding bias. We find that training phoneme classification probes only on a single, typically disadvantaged SG, sometimes improves performance for that SG, which is evidence for the existence of SG-level bias in phoneme embeddings. On the other hand, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
