Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Kai Tan; Lin Zhang; Ruiteng Zhang; Johan Rohdin; Leibny Paola Garc\'ia-Perera; Zexin Cai; Sanjeev Khudanpur; Matthew Wiesner; Nicholas Andrews

arXiv:2603.13780·eess.AS·March 19, 2026

Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR

Kai Tan, Lin Zhang, Ruiteng Zhang, Johan Rohdin, Leibny Paola Garc\'ia-Perera, Zexin Cai, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

PDF

Open Access

TL;DR

This paper introduces a unified three-class framework for spoofing-robust automatic speaker verification that enhances interpretability and adaptability over traditional bi-encoder models, achieving competitive results on standard datasets.

Contribution

The paper proposes a novel end-to-end three-class formulation enabling LLR inference, improving interpretability and flexibility in SASV systems.

Findings

01

Comparable performance to existing methods on ASVSpoof5

02

Superior results on SpoofCeleb dataset

03

Enhanced interpretability demonstrated through visualization

Abstract

Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Machine Learning and Data Classification