Generalizable speech deepfake detection via meta-learned LoRA
Janne Laakkonen, Ivan Kukanov, Ville Hautam\"aki

TL;DR
This paper introduces a meta-learning approach using LoRA adapters for speech deepfake detection, achieving strong zero-shot performance and robustness across different attack types with minimal parameter updates.
Contribution
It presents a novel combination of LoRA adapters and meta-learning for domain generalization in speech deepfake detection, significantly reducing training parameters while improving cross-domain accuracy.
Findings
Achieves lower EER (5.30%) compared to full fine-tuning (8.84%).
Requires only 1.1% of parameters compared to full fine-tuning.
Outperforms fully fine-tuned models on five of six evaluation corpora.
Abstract
Reliable detection of speech deepfakes (spoofs) must remain effective when the distribution of spoofing attacks shifts. We frame the task as domain generalization and show that inserting Low-Rank Adaptation (LoRA) adapters into every attention head of a self-supervised (SSL) backbone, then training only those adapters with Meta-Learning Domain Generalization (MLDG), yields strong zero-shot performance. The resulting model updates about 3.6 million parameters, roughly 1.1% of the 318 million updated in full fine-tuning, yet surpasses a fully fine-tuned counterpart on five of six evaluation corpora. A first-order MLDG loop encourages the adapters to focus on cues that persist across attack types, lowering the average EER from 8.84% for the fully fine-tuned model to 5.30% with our best MLDG-LoRA configuration. Our findings show that combining meta-learning with parameter-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Anomaly Detection Techniques and Applications · Speech and Audio Processing
