RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic   Weighting

Hui Wang; Shiwan Zhao; Xiguang Zheng; Yong Qin

arXiv:2308.16488·eess.AS·September 1, 2023

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

Hui Wang, Shiwan Zhao, Xiguang Zheng, Yong Qin

PDF

TL;DR

The paper introduces RAMP, a retrieval-augmented method for MOS prediction that dynamically adjusts retrieval scope and fusion weights based on confidence, improving performance in synthetic speech quality evaluation.

Contribution

RAMP enhances MOS prediction by integrating retrieval-augmented features with a confidence-based dynamic weighting mechanism, addressing data scarcity for the decoder.

Findings

01

Outperforms existing methods in multiple scenarios

02

Improves decoder performance under data scarcity

03

Demonstrates effectiveness of confidence-based dynamic weighting

Abstract

Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality of the synthetic speech. While recent approaches using pre-trained self-supervised learning (SSL) models have shown promising results, they only partly address the data scarcity issue for the feature extractor. This leaves the data scarcity issue for the decoder unresolved and leading to suboptimal performance. To address this challenge, we propose a retrieval-augmented MOS prediction method, dubbed {\bf RAMP}, to enhance the decoder's ability against the data scarcity issue. A fusing network is also proposed to dynamically adjust the retrieval scope for each instance and the fusion weights based on the predictive confidence. Experimental results show that our proposed method outperforms the existing methods in multiple scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.