Audio Deep Fake Detection System with Neural Stitching for ADD 2022
Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

TL;DR
This paper presents a neural stitching-enhanced ResNet-based system for audio deep fake detection, achieving state-of-the-art results in the ADD 2022 challenge by improving generalization across different fake audio generation methods.
Contribution
The paper introduces neural stitching to enhance model generalization in audio deep fake detection, demonstrating superior performance in the ADD 2022 challenge.
Findings
Achieved 10.1% EER in Track 3.2 of ADD 2022.
Neural stitching improves cross-task generalization.
ResNet with multi-head attention effectively detects fake audio.
Abstract
This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}. The very same system was used for both two rounds of evaluation in Track 3.2 with a similar training methodology. The first round of Track 3.2 data is generated from Text-to-Speech(TTS) or voice conversion (VC) algorithms, while the second round of data consists of generated fake audio from other participants in Track 3.1, aiming to spoof our systems. Our systems use a standard 34-layer ResNet, with multi-head attention pooling \cite{india2019self} to learn the discriminative embedding for fake audio and spoof detection. We further utilize neural stitching to boost the model's generalization capability in order to perform equally well in different tasks, and more details will be explained in the following sessions. The experiments show that our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsAttention Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Linear Layer · Average Pooling · Batch Normalization · Kaiming Initialization · Residual Connection · Max Pooling · 1x1 Convolution
