Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Seth Layton; Thiago De Andrade; Daniel Olszewski; Kevin Warren; Kevin; Butler; Patrick Traynor

arXiv:2404.15143·cs.SD·April 30, 2024·1 cites

Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Kevin, Butler, Patrick Traynor

PDF

Open Access

TL;DR

This paper introduces a novel breath-based detector for deepfake speech that outperforms complex models, demonstrating breath as a key indicator of natural speech authenticity.

Contribution

The paper presents a simple breath detection method for deepfake speech, creating a new dataset and showing its effectiveness over state-of-the-art deep learning models.

Findings

01

Breath-based detection achieves perfect AUPRC and zero EER on test data.

02

Complex SSL-wav2vec model fails to classify in-the-wild deepfakes effectively.

03

Public dataset facilitates future research in deepfake detection.

Abstract

Deepfake speech represents a real and growing threat to systems and society. Many detectors have been created to aid in defense against speech deepfakes. While these detectors implement myriad methodologies, many rely on low-level fragments of the speech generation process. We hypothesize that breath, a higher-level part of speech, is a key component of natural speech and thus improper generation in deepfake speech is a performant discriminator. To evaluate this, we create a breath detector and leverage this against a custom dataset of online news article audio to discriminate between real/deepfake speech. Additionally, we make this custom dataset publicly available to facilitate comparison for future work. Applying our simple breath detector as a deepfake speech discriminator on in-the-wild samples allows for accurate classification (perfect 1.0 AUPRC and 0.0 EER on test data) across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis