Analysis of XLS-R for Speech Quality Assessment

Bastiaan Tamm; Rik Vandenberghe; Hugo Van hamme

arXiv:2308.12077·eess.AS·August 24, 2023

Analysis of XLS-R for Speech Quality Assessment

Bastiaan Tamm, Rik Vandenberghe, Hugo Van hamme

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of XLS-R pre-trained embeddings for automated speech quality assessment, analyzing layer-specific features, model sizes, and their relation to noise and speech content for improved MOS prediction.

Contribution

It provides an in-depth analysis of XLS-R embeddings across layers and model sizes, revealing optimal feature regions and their roles in speech quality prediction.

Findings

01

Lower-level features capture noise and acoustics

02

High-level features focus on speech content

03

Fusion of features improves prediction accuracy

Abstract

In online conferencing applications, estimating the perceived quality of an audio signal is crucial to ensure high quality of experience for the end user. The most reliable way to assess the quality of a speech signal is through human judgments in the form of the mean opinion score (MOS) metric. However, such an approach is labor intensive and not feasible for large-scale applications. The focus has therefore shifted towards automated speech quality assessment through end-to-end training of deep neural networks. Recently, it was shown that leveraging pre-trained wav2vec-based XLS-R embeddings leads to state-of-the-art performance for the task of speech quality prediction. In this paper, we perform an in-depth analysis of the pre-trained model. First, we analyze the performance of embeddings extracted from each layer of XLS-R and also for each size of the model (300M, 1B, 2B parameters).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lcn-kul/xls-r-analysis-sqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsFocus