Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Inbal Rimon; Oren Gal; Haim Permuter

arXiv:2501.05545·cs.SD·November 14, 2025

Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Inbal Rimon, Oren Gal, Haim Permuter

PDF

Open Access

TL;DR

This paper introduces a hybrid deepfake speech detection framework using novel spectrogram and feature masking augmentations, combined with compression-aware self-supervised learning, achieving state-of-the-art results on multiple benchmarks.

Contribution

It proposes a dual-stage masking approach and a compression-aware training strategy within a unified model for improved deepfake speech detection.

Findings

01

Achieved 4.08% EER on ASVspoof5 Challenge (Track 1)

02

Obtained 0.18% EER on ASVspoof2019 evaluation set

03

Reached 2.92% EER on ASVspoof2021 DF task

Abstract

Deepfake speech detection presents a growing challenge as generative audio technologies continue to advance. We propose a hybrid training framework that advances detection performance through novel augmentation strategies. First, we introduce a dual-stage masking approach that operates both at the spectrogram level (MaskedSpec) and within the latent feature space (MaskedFeature), providing complementary regularization that improves tolerance to localized distortions and enhances generalization learning. Second, we introduce compression-aware strategy during self-supervised to increase variability in low-resource scenarios while preserving the integrity of learned representations, thereby improving the suitability of pretrained features for deepfake detection. The framework integrates a learnable self-supervised feature extractor with a ResNet classification head in a unified training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing