Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality   Assessment Model

Ryandhimas E. Zezario; Bo-Ren Brian Bai; Chiou-Shann Fuh; Hsin-Min; Wang; Yu Tsao

arXiv:2308.09262·eess.AS·March 14, 2024

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min, Wang, Yu Tsao

PDF

Open Access

TL;DR

This paper introduces MTQ-Net, a non-intrusive speech quality assessment model that leverages multi-task pseudo-label learning to improve prediction accuracy by combining pseudo-labels from a pretrained model with ground-truth data.

Contribution

The study presents a novel multi-task pseudo-label learning framework for speech quality assessment, demonstrating improved performance over existing SSL-based models.

Findings

01

MPL outperforms training from scratch and direct knowledge transfer.

02

Huber loss enhances predictive accuracy.

03

MTQ-Net achieves higher predictive power than other SSL-based models.

Abstract

This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsHuber loss