Predicting score distribution to improve non-intrusive speech quality estimation
Abu Zaher Md Faridee, Hannes Gamper

TL;DR
This paper enhances non-intrusive speech quality estimation by integrating opinion score distributions, leading to more accurate MOS predictions with minimal modifications to existing models.
Contribution
It introduces methods to incorporate opinion score distribution information into neural network-based MOS estimation models, improving their accuracy.
Findings
Up to 0.016 RMSE improvement in MOS prediction
Up to 1% SRCC enhancement
Effective integration with minimal model modification
Abstract
Deep noise suppressors (DNS) have become an attractive solution to remove background noise, reverberation, and distortions from speech and are widely used in telephony/voice applications. They are also occasionally prone to introducing artifacts and lowering the perceptual quality of the speech. Subjective listening tests that use multiple human judges to derive a mean opinion score (MOS) are a popular way to measure these models' performance. Deep neural network based non-intrusive MOS estimation models have recently emerged as a popular cost-efficient alternative to these tests. These models are trained with only the MOS labels, often discarding the secondary statistics of the opinion scores. In this paper, we investigate several ways to integrate the distribution of opinion scores (e.g. variance, histogram information) to improve the MOS estimation performance. Our model is trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Ultrasonics and Acoustic Wave Propagation
