Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR
Kai Tan, Lin Zhang, Ruiteng Zhang, Johan Rohdin, Leibny Paola Garc\'ia-Perera, Zexin Cai, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

TL;DR
This paper introduces a unified three-class framework for spoofing-robust automatic speaker verification that enhances interpretability and adaptability over traditional bi-encoder models, achieving competitive results on standard datasets.
Contribution
The paper proposes a novel end-to-end three-class formulation enabling LLR inference, improving interpretability and flexibility in SASV systems.
Findings
Comparable performance to existing methods on ASVSpoof5
Superior results on SpoofCeleb dataset
Enhanced interpretability demonstrated through visualization
Abstract
Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Machine Learning and Data Classification
