MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score   Prediction

Wangjin Zhou; Zhengdong Yang; Chenhui Chu; Sheng Li; Raj Dabre; Yi; Zhao; Tatsuya Kawahara

arXiv:2401.13249·eess.AS·January 26, 2024·1 cites

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi, Zhao, Tatsuya Kawahara

PDF

Open Access

TL;DR

This paper introduces MOS-FAD, a novel approach that uses automatic MOS prediction to improve fake audio detection by filtering training data and enhancing model fusion, leading to better detection accuracy.

Contribution

The study extends MOS prediction to fake audio detection, demonstrating its effectiveness in data filtering and model fusion to improve detection performance.

Findings

01

MOS improves training data selection for FAD.

02

Incorporating MOS in model fusion enhances detection accuracy.

03

MOS-based filtering balances datasets effectively.

Abstract

Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion. In training data selection, we demonstrate that MOS enables effective filtering of samples from unbalanced datasets. In the model fusion, our results demonstrate that incorporating MOS as a gating mechanism in FAD model fusion enhances overall performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing