Improved DeepFake Detection Using Whisper Features
Piotr Kawa, Marcin Plata, Micha{\l} Czuba, Piotr Szyma\'nski, Piotr, Syga

TL;DR
This paper explores using Whisper speech recognition features as a front-end for DeepFake audio detection, demonstrating improved accuracy and reduced error rates across multiple detection models and datasets.
Contribution
It introduces the use of Whisper features as a novel front-end for DeepFake detection, outperforming existing methods on in-the-wild datasets.
Findings
Whisper features improve detection accuracy for all tested models.
Using Whisper reduces Equal Error Rate by 21% on the In-The-Wild dataset.
Whisper-based front-ends outperform traditional features in DeepFake audio detection.
Abstract
With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing. Several different detection methods have been presented as a countermeasure. Many methods are based on so-called front-ends, which, by transforming the raw audio, emphasize features crucial for assessing the genuineness of the audio sample. Our contribution contains investigating the influence of the state-of-the-art Whisper automatic speech recognition model as a DF detection front-end. We compare various combinations of Whisper and well-established front-ends by training 3 detection models (LCNN, SpecRNet, and MesoNet) on a widely used ASVspoof 2021 DF dataset and later evaluating them on the DF In-The-Wild dataset. We show that using Whisper-based features improves the detection for each model and outperforms recent results on the In-The-Wild dataset by reducing Equal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
