Speech Quality Assessment through MOS using Non-Matching References

Pranay Manocha; Anurag Kumar

arXiv:2206.12285·eess.AS·June 27, 2022·1 cites

Speech Quality Assessment through MOS using Non-Matching References

Pranay Manocha, Anurag Kumar

PDF

Open Access 1 Repo

TL;DR

This paper introduces NORESQA-MOS, a novel neural network framework that uses non-matching references to improve the robustness and generalization of speech quality assessment via MOS estimation, outperforming existing methods.

Contribution

The paper presents a new MOS estimation framework using non-matching references, enhancing robustness and generalization over prior deep learning approaches.

Findings

01

NORESQA-MOS outperforms DNSMOS and NISQA in generalization.

02

It requires a smaller training set.

03

It can be combined with self-supervised learning methods.

Abstract

Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals. However, several recent attempts to automatically estimate MOS using deep learning approaches lack robustness and generalization capabilities, limiting their use in real-world applications. In this work, we present a novel framework, NORESQA-MOS, for estimating the MOS of a speech signal. Unlike prior works, our approach uses non-matching references as a form of conditioning to ground the MOS estimation by neural networks. We show that NORESQA-MOS provides better generalization and more robust MOS estimation than previous state-of-the-art methods such as DNSMOS and NISQA, even though we use a smaller training set. Moreover, we also show that our generic framework can be combined with other learning methods such as self-supervised learning and can further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/Noresqa
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing