Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

Haibin Wu; Jiawen Kang; Lingwei Meng; Yang Zhang; Xixin Wu; Zhiyong; Wu; Hung-yi Lee; Helen Meng

arXiv:2206.09131·cs.SD·June 22, 2022

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong, Wu, Hung-yi Lee, Helen Meng

PDF

Open Access

TL;DR

This paper introduces a multi-model fusion framework for spoofing-aware speaker verification that significantly enhances robustness against spoofing attacks, achieving an 86% relative reduction in error rate.

Contribution

It proposes a novel fusion-based SASV system combining multiple state-of-the-art models, substantially improving spoofing detection and speaker verification performance.

Findings

01

SASV-EER reduced from 8.75% to 1.17%.

02

Achieved 86% relative improvement over baseline.

03

Demonstrated effectiveness of multi-model fusion in SASV.

Abstract

Recent years have witnessed the extraordinary development of automatic speaker verification (ASV). However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process. How to integrate the CM and ASV together remains an open question. A spoofing aware speaker verification (SASV) challenge has recently taken place with the argument that better performance can be delivered when both CM and ASV subsystems are optimized jointly. Under the challenge's scenario, the integrated systems proposed by the participants are required to reject both impostor speakers and spoofing attacks from target speakers, which intuitively and effectively matches the expectation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsAttentive Walk-Aggregating Graph Neural Network