Audio Deep Fake Detection System with Neural Stitching for ADD 2022

Rui Yan; Cheng Wen; Shuran Zhou; Tingwei Guo; Wei Zou; Xiangang Li

arXiv:2204.08720·eess.AS·April 21, 2022

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

PDF

Open Access

TL;DR

This paper presents a neural stitching-enhanced ResNet-based system for audio deep fake detection, achieving state-of-the-art results in the ADD 2022 challenge by improving generalization across different fake audio generation methods.

Contribution

The paper introduces neural stitching to enhance model generalization in audio deep fake detection, demonstrating superior performance in the ADD 2022 challenge.

Findings

01

Achieved 10.1% EER in Track 3.2 of ADD 2022.

02

Neural stitching improves cross-task generalization.

03

ResNet with multi-head attention effectively detects fake audio.

Abstract

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of generated fake audio from other participants in Track 3.1, aiming to spoof our systems. Our systems use a standard 34-layer ResNet, with multi-head attention pooling \cite{india2019self} to learn the discriminative embedding for fake audio and spoof detection. We further utilize neural stitching to boost the model's generalization capability in order to perform equally well in different tasks, and more details will be explained in the following sessions. The experiments show that our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsAttention Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Linear Layer · Average Pooling · Batch Normalization · Kaiming Initialization · Residual Connection · Max Pooling · 1x1 Convolution