On the Encoding of Gender in Transformer-based ASR Representations
Aravind Krishnan, Badr M. Abdullah, Dietrich Klakow

TL;DR
This paper investigates how gender information is encoded in transformer-based ASR models, demonstrating that gender can be erased with minimal impact on performance and revealing where gender information is concentrated within the models.
Contribution
It introduces a method to remove gender information from ASR models and analyzes the encoding of gender in their latent representations, highlighting potential for gender-neutral embeddings.
Findings
Gender information is concentrated in the first and last frames of final layers.
Removing gender information has minimal impact on ASR performance.
Gender can be effectively erased from model representations using linear methods.
Abstract
While existing literature relies on performance differences to uncover gender biases in ASR models, a deeper analysis is essential to understand how gender is encoded and utilized during transcript generation. This work investigates the encoding and utilization of gender in the latent representations of two transformer-based ASR models, Wav2Vec2 and HuBERT. Using linear erasure, we demonstrate the feasibility of removing gender information from each layer of an ASR model and show that such an intervention has minimal impacts on the ASR performance. Additionally, our analysis reveals a concentration of gender information within the first and last frames in the final layers, explaining the ease of erasing gender in these layers. Our findings suggest the prospect of creating gender-neutral embeddings that can be integrated into ASR frameworks without compromising their efficacy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUltrasonics and Acoustic Wave Propagation · Speech and Audio Processing · Fault Detection and Control Systems
