Every Breath You Don't Take: Deepfake Speech Detection Using Breath
Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Kevin, Butler, Patrick Traynor

TL;DR
This paper introduces a novel breath-based detector for deepfake speech that outperforms complex models, demonstrating breath as a key indicator of natural speech authenticity.
Contribution
The paper presents a simple breath detection method for deepfake speech, creating a new dataset and showing its effectiveness over state-of-the-art deep learning models.
Findings
Breath-based detection achieves perfect AUPRC and zero EER on test data.
Complex SSL-wav2vec model fails to classify in-the-wild deepfakes effectively.
Public dataset facilitates future research in deepfake detection.
Abstract
Deepfake speech represents a real and growing threat to systems and society. Many detectors have been created to aid in defense against speech deepfakes. While these detectors implement myriad methodologies, many rely on low-level fragments of the speech generation process. We hypothesize that breath, a higher-level part of speech, is a key component of natural speech and thus improper generation in deepfake speech is a performant discriminator. To evaluate this, we create a breath detector and leverage this against a custom dataset of online news article audio to discriminate between real/deepfake speech. Additionally, we make this custom dataset publicly available to facilitate comparison for future work. Applying our simple breath detector as a deepfake speech discriminator on in-the-wild samples allows for accurate classification (perfect 1.0 AUPRC and 0.0 EER on test data) across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
