How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey
Zahra Khanjani, Gabrielle Watson, and Vandana P. Janeja

TL;DR
This survey critically examines audio deepfake generation and detection methods, highlighting the research gap and emphasizing the need for more robust detection techniques in this overlooked domain.
Contribution
It provides the first comprehensive survey focused on audio deepfakes, analyzing methods from 2016 to 2020 and identifying key trends and research gaps.
Findings
GAN, CNN, and DNN are common in creating and detecting deepfakes.
Most research focuses on video deepfakes, with less on audio.
Detection methods for audio deepfakes are less developed and need further research.
Abstract
Deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. This survey has been conducted with a different perspective compared to existing survey papers, that mostly focus on just video and image deepfakes. This survey not only evaluates generation and detection methods in the different deepfake categories, but mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This paper critically analyzes and provides a unique source of audio deepfake research, mostly ranging from 2016 to 2020. To the best of our knowledge, this is the first survey focusing on audio deepfakes in English. This survey provides readers with a summary of 1) different deepfake categories 2) how they could be created and detected 3) the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
