More for Less: Non-Intrusive Speech Quality Assessment with Limited Annotations
Alessandro Ragano, Emmanouil Benetos, Andrew Hines

TL;DR
This paper introduces two multi-task learning models that leverage large unlabelled datasets to improve non-intrusive speech quality assessment with limited annotated data, outperforming existing baselines.
Contribution
It proposes novel multi-task models using deep clustering and degradation classification to enhance speech quality prediction with scarce annotations.
Findings
Deep clustering-based model outperforms degradation classifier model.
Models outperform baseline methods on TCD-VoIP dataset.
Multi-task learning with unlabelled data improves assessment accuracy.
Abstract
Non-intrusive speech quality assessment is a crucial operation in multimedia applications. The scarcity of annotated data and the lack of a reference signal represent some of the main challenges for designing efficient quality assessment metrics. In this paper, we propose two multi-task models to tackle the problems above. In the first model, we first learn a feature representation with a degradation classifier on a large dataset. Then we perform MOS prediction and degradation classification simultaneously on a small dataset annotated with MOS. In the second approach, the initial stage consists of learning features with a deep clustering-based unsupervised feature representation on the large dataset. Next, we perform MOS prediction and cluster label classification simultaneously on a small dataset. The results show that the deep clustering-based model outperforms the degradation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
