SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing
Siwen Ding, You Zhang, Zhiyao Duan

TL;DR
This paper introduces SAMO, a novel speaker attractor multi-center one-class learning approach that enhances voice anti-spoofing by better modeling speaker diversity and improving detection of unseen attacks.
Contribution
SAMO clusters bona fide speech around multiple speaker attractors and co-optimizes clustering with spoof detection, advancing anti-spoofing performance.
Findings
Outperforms state-of-the-art systems with 38% relative EER reduction.
Effectively handles speakers without enrollment.
Improves generalization to unseen speech synthesis attacks.
Abstract
Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
