Fully Automated End-to-End Fake Audio Detection

Chenglong Wang; Jiangyan Yi; Jianhua Tao; Haiyang Sun; Xun Chen,; Zhengkun Tian; Haoxin Ma; Cunhang Fan; Ruibo Fu

arXiv:2208.09618·cs.SD·August 23, 2022

Fully Automated End-to-End Fake Audio Detection

Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen,, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

PDF

Open Access

TL;DR

This paper introduces a fully automated end-to-end fake audio detection system that leverages wav2vec for speech representation and a modified DARTS for neural architecture search, achieving state-of-the-art results.

Contribution

It presents a novel automated framework combining wav2vec and light-DARTS for fake audio detection, eliminating manual feature and hyperparameter tuning.

Findings

01

Achieves an EER of 1.08% on ASVspoof 2019 LA dataset

02

Outperforms existing state-of-the-art single systems

03

Demonstrates effectiveness of automated neural architecture search

Abstract

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. Furthermore, for the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS. It learns deep speech representations while automatically learning and optimizing complex neural structures consisting of convolutional operations and residual blocks. The experimental results on the ASVspoof 2019 LA dataset show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection