The Effect of Spoken Language on Speech Enhancement using   Self-Supervised Speech Representation Loss Functions

George Close; Thomas Hain; Stefan Goetze

arXiv:2307.14502·eess.AS·October 23, 2023·1 cites

The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions

George Close, Thomas Hain, Stefan Goetze

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the language used to train self-supervised speech representations influences speech enhancement performance, finding that training data quantity impacts results more than language match.

Contribution

It systematically evaluates the effect of training language and data quantity on self-supervised representations in speech enhancement models.

Findings

01

Language match has minor impact on performance.

02

Training data quantity significantly affects enhancement results.

03

Models trained with more data perform better across languages.

Abstract

Recent work in the field of speech enhancement (SE) has involved the use of self-supervised speech representations (SSSRs) as feature transformations in loss functions. However, in prior work, very little attention has been paid to the relationship between the language of the audio used to train the self-supervised representation and that used to train the SE system. Enhancement models trained using a loss function which incorporates a self-supervised representation that shares exactly the language of the noisy data used to train the SE system show better performance than those which do not match exactly. This may lead to enhancement systems which are language specific and as such do not generalise well to unseen languages, unlike models trained using traditional spectrogram or time domain loss functions. In this work, SE models are trained and tested on a number of different languages,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leto19/commonvoice-demand
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development