Synthetic speech detection using meta-learning with prototypical loss

Monisankha Pal; Aditya Raikar; Ashish Panda; Sunil Kumar Kopparapu

arXiv:2201.09470·eess.AS·January 25, 2022·5 cites

Synthetic speech detection using meta-learning with prototypical loss

Monisankha Pal, Aditya Raikar, Ashish Panda, Sunil Kumar Kopparapu

PDF

Open Access

TL;DR

This paper introduces a meta-learning approach with prototypical loss for synthetic speech detection, improving generalization to unseen spoofing attacks and achieving state-of-the-art results on ASVspoof datasets.

Contribution

It proposes a novel anti-spoofing system using prototypical loss within a meta-learning framework, enhancing detection of unseen spoofing attacks without relying on data augmentation.

Findings

01

Achieves competitive performance on ASVspoof 2019 LA task without data augmentation.

02

Outperforms baseline on ASVspoof 2021 LA task with data augmentation.

03

Attains significant improvements in min-tDCF compared to challenge baselines.

Abstract

Recent works on speech spoofing countermeasures still lack generalization ability to unseen spoofing attacks. This is one of the key issues of ASVspoof challenges especially with the rapid development of diverse and high-quality spoofing algorithms. In this work, we address the generalizability of spoofing detection by proposing prototypical loss under the meta-learning paradigm to mimic the unseen test scenario during training. Prototypical loss with metric-learning objectives can learn the embedding space directly and emerges as a strong alternative to prevailing classification loss functions. We propose an anti-spoofing system based on squeeze-excitation Residual network (SE-ResNet) architecture with prototypical loss. We demonstrate that the proposed single system without any data augmentation can achieve competitive performance to the recent best anti-spoofing systems on ASVspoof…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders