An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao, Qilong Yuan, Yibo Duan, Zhuoyue Chen

TL;DR
This paper presents an end-to-end multi-module synthetic speech generation system that integrates speaker encoding, Tacotron2-based synthesis, and WaveRNN vocoding, achieving top performance in the ADD 2023 challenge.
Contribution
It introduces a novel integrated multi-module model for synthetic speech generation and demonstrates its effectiveness through extensive experiments and winning the ADD 2023 challenge.
Findings
Achieved a WDSR of 44.97% in ADD 2023 challenge
Compared various datasets and model structures extensively
Outperformed existing methods in synthetic speech deception success
Abstract
The task of synthetic speech generation is to generate language content from a given text, then simulating fake human voice.The key factors that determine the effect of synthetic speech generation mainly include speed of generation, accuracy of word segmentation, naturalness of synthesized speech, etc. This paper builds an end-to-end multi-module synthetic speech generation model, including speaker encoder, synthesizer based on Tacotron2, and vocoder based on WaveRNN. In addition, we perform a lot of comparative experiments on different datasets and various model structures. Finally, we won the first place in the ADD 2023 challenge Track 1.1 with the weighted deception success rate (WDSR) of 44.97%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Tanh Activation · Sigmoid Activation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · WaveRNN
