Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System
Yuanjun Zhao, Roberto Togneri, Victor Sreeram

TL;DR
This paper introduces a multi-task learning-based automatic speaker verification system that jointly detects spoofing attacks and verifies speakers, significantly improving robustness against diverse spoofing methods.
Contribution
It presents a novel multi-task deep learning architecture that integrates spoofing detection with speaker verification, enhancing performance over standalone solutions.
Findings
Outperforms state-of-the-art systems on ASVspoof 2017 and 2019 datasets.
Achieves substantial improvements under various spoofing attack conditions.
Demonstrates the effectiveness of joint training for anti-spoofing and speaker verification.
Abstract
Spoofing attacks posed by generating artificial speech can severely degrade the performance of a speaker verification system. Recently, many anti-spoofing countermeasures have been proposed for detecting varying types of attacks from synthetic speech to replay presentations. While there are numerous effective defenses reported on standalone anti-spoofing solutions, the integration for speaker verification and spoofing detection systems has obvious benefits. In this paper, we propose a spoofing-robust automatic speaker verification (SR-ASV) system for diverse attacks based on a multi-task learning architecture. This deep learning based model is jointly trained with time-frequency representations from utterances to provide recognition decisions for both tasks simultaneously. Compared with other state-of-the-art systems on the ASVspoof 2017 and 2019 corpora, a substantial improvement of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
