Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes
Ivan Kukanov, Janne Laakkonen, Tomi Kinnunen, Ville Hautam\"aki

TL;DR
This paper explores meta-learning techniques to enhance the detection of unseen speech deepfakes, achieving significant improvements in generalization with minimal samples, addressing a key challenge in real-world applications.
Contribution
The study introduces a meta-learning framework for attack-invariant feature learning, enabling effective few-shot adaptation to unseen speech deepfake attacks.
Findings
EER reduced from 21.67% to 10.42% on InTheWild dataset
Significant improvement with only 96 samples from unseen attacks
Continuous adaptation maintains detection performance over time
Abstract
Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Anomaly Detection Techniques and Applications · Speech and Audio Processing
