Predicting score distribution to improve non-intrusive speech quality   estimation

Abu Zaher Md Faridee; Hannes Gamper

arXiv:2204.06616·cs.SD·April 15, 2022

Predicting score distribution to improve non-intrusive speech quality estimation

Abu Zaher Md Faridee, Hannes Gamper

PDF

Open Access

TL;DR

This paper enhances non-intrusive speech quality estimation by integrating opinion score distributions, leading to more accurate MOS predictions with minimal modifications to existing models.

Contribution

It introduces methods to incorporate opinion score distribution information into neural network-based MOS estimation models, improving their accuracy.

Findings

01

Up to 0.016 RMSE improvement in MOS prediction

02

Up to 1% SRCC enhancement

03

Effective integration with minimal model modification

Abstract

Deep noise suppressors (DNS) have become an attractive solution to remove background noise, reverberation, and distortions from speech and are widely used in telephony/voice applications. They are also occasionally prone to introducing artifacts and lowering the perceptual quality of the speech. Subjective listening tests that use multiple human judges to derive a mean opinion score (MOS) are a popular way to measure these models' performance. Deep neural network based non-intrusive MOS estimation models have recently emerged as a popular cost-efficient alternative to these tests. These models are trained with only the MOS labels, often discarding the secondary statistics of the opinion scores. In this paper, we investigate several ways to integrate the distribution of opinion scores (e.g. variance, histogram information) to improve the MOS estimation performance. Our model is trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Ultrasonics and Acoustic Wave Propagation