Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification   System

Yuanjun Zhao; Roberto Togneri; Victor Sreeram

arXiv:2012.03154·eess.AS·December 8, 2020

Multi-task Learning Based Spoofing-Robust Automatic Speaker Verification System

Yuanjun Zhao, Roberto Togneri, Victor Sreeram

PDF

Open Access

TL;DR

This paper introduces a multi-task learning-based automatic speaker verification system that jointly detects spoofing attacks and verifies speakers, significantly improving robustness against diverse spoofing methods.

Contribution

It presents a novel multi-task deep learning architecture that integrates spoofing detection with speaker verification, enhancing performance over standalone solutions.

Findings

01

Outperforms state-of-the-art systems on ASVspoof 2017 and 2019 datasets.

02

Achieves substantial improvements under various spoofing attack conditions.

03

Demonstrates the effectiveness of joint training for anti-spoofing and speaker verification.

Abstract

Spoofing attacks posed by generating artificial speech can severely degrade the performance of a speaker verification system. Recently, many anti-spoofing countermeasures have been proposed for detecting varying types of attacks from synthetic speech to replay presentations. While there are numerous effective defenses reported on standalone anti-spoofing solutions, the integration for speaker verification and spoofing detection systems has obvious benefits. In this paper, we propose a spoofing-robust automatic speaker verification (SR-ASV) system for diverse attacks based on a multi-task learning architecture. This deep learning based model is jointly trained with time-frequency representations from utterances to provide recognition decisions for both tasks simultaneously. Compared with other state-of-the-art systems on the ASVspoof 2017 and 2019 corpora, a substantial improvement of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing