Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

Anssi Kanervisto; Ville Hautam\"aki; Tomi Kinnunen; Junichi Yamagishi

arXiv:2201.09709·cs.SD·January 25, 2022

Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

Anssi Kanervisto, Ville Hautam\"aki, Tomi Kinnunen, Junichi Yamagishi

PDF

TL;DR

This paper introduces a method to optimize combined speaker verification and anti-spoofing systems directly by making the tandem detection cost function differentiable and applying reinforcement learning, leading to significant performance improvements.

Contribution

It proposes a novel differentiable t-DCF and reinforcement learning approach to jointly optimize tandem speaker verification and anti-spoofing systems, which were previously trained separately.

Findings

01

20% relative improvement in t-DCF on ASVSpoof19 dataset

02

Outperforms traditional finetuning methods

03

Enhances security by better tandem system optimization

Abstract

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.