More for Less: Non-Intrusive Speech Quality Assessment with Limited   Annotations

Alessandro Ragano; Emmanouil Benetos; Andrew Hines

arXiv:2108.08745·eess.AS·August 20, 2021

More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations

Alessandro Ragano, Emmanouil Benetos, Andrew Hines

PDF

TL;DR

This paper introduces two multi-task learning models that leverage large unlabelled datasets to improve non-intrusive speech quality assessment with limited annotated data, outperforming existing baselines.

Contribution

It proposes novel multi-task models using deep clustering and degradation classification to enhance speech quality prediction with scarce annotations.

Findings

01

Deep clustering-based model outperforms degradation classifier model.

02

Models outperform baseline methods on TCD-VoIP dataset.

03

Multi-task learning with unlabelled data improves assessment accuracy.

Abstract

Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradation classifier on a large dataset. Then we perform MOS prediction and degradation classification simultaneously on a small dataset annotated with MOS. In the second approach, the initial stage consists of learning features with a deep clustering-based unsupervised feature representation on the large dataset. Next, we perform MOS prediction and cluster label classification simultaneously on a small dataset. The results show that the deep clustering-based model outperforms the degradation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.