Optimizing Tandem Speaker Verification and Anti-Spoofing Systems
Anssi Kanervisto, Ville Hautam\"aki, Tomi Kinnunen, Junichi Yamagishi

TL;DR
This paper introduces a method to optimize combined speaker verification and anti-spoofing systems directly by making the tandem detection cost function differentiable and applying reinforcement learning, leading to significant performance improvements.
Contribution
It proposes a novel differentiable t-DCF and reinforcement learning approach to jointly optimize tandem speaker verification and anti-spoofing systems, which were previously trained separately.
Findings
20% relative improvement in t-DCF on ASVSpoof19 dataset
Outperforms traditional finetuning methods
Enhances security by better tandem system optimization
Abstract
As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
