Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction   and Model Fusion

Yu-Fei Shi; Yang Ai; Ye-Xin Lu; Hui-Peng Du; Zhen-Hua Ling

arXiv:2411.11123·cs.SD·December 24, 2024

Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion

Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling

PDF

Open Access

TL;DR

This paper presents a novel pitch-and-spectrum-aware singing quality assessment method that uses self-supervised learning, bias correction, and model fusion to achieve state-of-the-art prediction accuracy in singing MOS prediction.

Contribution

The paper introduces PS-SQA, a new singing quality assessment approach that integrates pitch and spectral features, bias correction, and model fusion, improving over previous methods.

Findings

01

PS-SQA outperforms all competing systems in system-level metrics.

02

Incorporating pitch and spectral information enhances prediction accuracy.

03

Bias correction and model fusion significantly improve robustness and performance.

Abstract

We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, we further improve our submission and propose a novel Pitch-and-Spectrum-aware Singing Quality Assessment (PS-SQA) method. The PS-SQA is designed based on the self-supervised-learning (SSL) MOS predictor, incorporating singing pitch and spectral information, which are extracted using pitch histogram and non-quantized neural codec, respectively. Additionally, the PS-SQA introduces a bias correction strategy to address prediction biases caused by low-resource training samples, and employs model fusion technology to further enhance prediction accuracy. Experimental results confirm that our proposed PS-SQA significantly outperforms all competing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Voice and Speech Disorders