Synthetic speech detection using meta-learning with prototypical loss
Monisankha Pal, Aditya Raikar, Ashish Panda, Sunil Kumar Kopparapu

TL;DR
This paper introduces a meta-learning approach with prototypical loss for synthetic speech detection, improving generalization to unseen spoofing attacks and achieving state-of-the-art results on ASVspoof datasets.
Contribution
It proposes a novel anti-spoofing system using prototypical loss within a meta-learning framework, enhancing detection of unseen spoofing attacks without relying on data augmentation.
Findings
Achieves competitive performance on ASVspoof 2019 LA task without data augmentation.
Outperforms baseline on ASVspoof 2021 LA task with data augmentation.
Attains significant improvements in min-tDCF compared to challenge baselines.
Abstract
Recent works on speech spoofing countermeasures still lack generalization ability to unseen spoofing attacks. This is one of the key issues of ASVspoof challenges especially with the rapid development of diverse and high-quality spoofing algorithms. In this work, we address the generalizability of spoofing detection by proposing prototypical loss under the meta-learning paradigm to mimic the unseen test scenario during training. Prototypical loss with metric-learning objectives can learn the embedding space directly and emerges as a strong alternative to prevailing classification loss functions. We propose an anti-spoofing system based on squeeze-excitation Residual network (SE-ResNet) architecture with prototypical loss. We demonstrate that the proposed single system without any data augmentation can achieve competitive performance to the recent best anti-spoofing systems on ASVspoof…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
