A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems
Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick, Est\`eve

TL;DR
This study investigates how gender distribution in pre-training data affects self-supervised speech models' performance on speech-to-text tasks, revealing complex patterns and minimal fairness differences.
Contribution
It provides a comparative analysis of gender-specific and gender-balanced wav2vec 2.0 models in French, highlighting the impact of pre-training data gender composition on downstream speech tasks.
Findings
Gender-specific pre-training lowers overall ASR performance.
Balanced pre-training does not always yield the best results.
Fairness metric shows little variation across models.
Abstract
Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to understand the impact caused by some features such as gender distribution within pre-training data. Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data. The comparison is performed by applying these models to two speech-to-text downstream tasks: ASR and ST. Results show the type of downstream integration matters. We observe lower overall performance using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
