Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models
Lam Pham, Phat Lam, Truong Nguyen, Huyen Nguyen, Alexander Schindler

TL;DR
This paper introduces a spectrogram-based deep learning ensemble system for detecting deepfake audio, achieving state-of-the-art performance on the ASVspoof 2019 benchmark with an EER of 0.03.
Contribution
It proposes a novel combination of spectrogram transformations, multiple deep learning models, and ensemble techniques for improved deepfake audio detection.
Findings
Best ensemble model achieved EER of 0.03 on ASVspoof 2019 dataset
Spectrogram transformations and deep learning models significantly enhance detection accuracy
Ensemble approach outperforms individual models in deepfake audio detection
Abstract
In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and discrete cosine transform (DCT). Given the spectrograms, we evaluate a wide range of classification models based on three deep learning approaches. The first approach is to train directly the spectrograms using our proposed baseline models of CNN-based model (CNN-baseline), RNN-based model (RNN-baseline), C-RNN model (C-RNN baseline). Meanwhile, the second approach is transfer learning from computer vision models such as ResNet-18, MobileNet-V3, EfficientNet-B0, DenseNet-121, SuffleNet-V2, Swint,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · GoogLeNet · Convolution · 1x1 Convolution · Auxiliary Classifier · Average Pooling · Dropout · Dense Connections · Inception Module · Softmax
