Language Bias in Self-Supervised Learning For Automatic Speech   Recognition

Edward Storey; Naomi Harte; Peter Bell

arXiv:2501.19321·eess.AS·February 3, 2025

Language Bias in Self-Supervised Learning For Automatic Speech Recognition

Edward Storey, Naomi Harte, Peter Bell

PDF

Open Access

TL;DR

This paper investigates language bias in multilingual self-supervised speech models, revealing that these models primarily rely on data-rich languages and bypass linguistic knowledge for fine-tuning.

Contribution

It introduces a novel application of the Lottery Ticket Hypothesis to identify language-specific subnetworks within XLS-R, highlighting biases towards data-rich languages.

Findings

01

XLS-R's subnetworks are language-specific.

02

Fine-tuning relies on weights from dominant languages.

03

Language bias affects model performance across languages.

Abstract

Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. Recently, large Automatic Speech Recognition (ASR) models such as XLS-R have utilised SSL to train on over one hundred different languages simultaneously. However, deeper investigation shows that the bulk of the training data for XLS-R comes from a small number of languages. Biases learned through SSL have been shown to exist in multiple domains, but language bias in multilingual SSL ASR has not been thoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis (LTH) to identify language-specific subnetworks within XLS-R and test the performance of these subnetworks on a variety of different languages. We are able to show that when fine-tuning, XLS-R bypasses traditional linguistic knowledge and builds only on weights learned from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis