Residual-Guided Non-Intrusive Speech Quality Assessment
Zhe Ye, Jiahao Chen, Diqun Yan

TL;DR
This paper introduces a residual-guided approach to enhance non-intrusive speech quality assessment by leveraging residuals between impaired and enhanced speech, significantly improving prediction accuracy.
Contribution
It proposes a novel residual-based method that incorporates enhanced speech to improve non-intrusive speech quality evaluation without reference audio.
Findings
31.3% improvement in PLCC
14.1% reduction in RMSE
Better correlation with human subjective scores
Abstract
This paper proposes an approach to improve Non-Intrusive speech quality assessment(NI-SQA) based on the residuals between impaired speech and enhanced speech. The difficulty in our task is particularly lack of information, for which the corresponding reference speech is absent. We generate an enhanced speech on the impaired speech to compensate for the absence of the reference audio, then pair the information of residuals with the impaired speech. Compared to feeding the impaired speech directly into the model, residuals could bring some extra helpful information from the contrast in enhancement. The human ear is sensitive to certain noises but different to deep learning model. Causing the Mean Opinion Score(MOS) the model predicted is not enough to fit our subjective sensitive well and causes deviation. These residuals have a close relationship to reference speech and then improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
