Data Quality as Predictor of Voice Anti-Spoofing Generalization
Bhusan Chettri, Rosa Gonz\'alez Hautam\"aki, Md Sahidullah, Tomi, Kinnunen

TL;DR
This paper investigates how data quality factors influence the ability of voice anti-spoofing systems to generalize across different datasets, using a new interpretative framework and multiple experiments.
Contribution
It introduces a novel framework for analyzing data quality's impact on anti-spoofing generalization and evaluates various data quality factors across multiple datasets and models.
Findings
Data quality significantly affects anti-spoofing performance.
Certain voice quality features improve cross-domain generalization.
Long-term spectral info and speaker embeddings impact detection accuracy.
Abstract
Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Many anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora) -- and we do not know \emph{why}. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance. Our within- and between-domain experiments pool data from seven public corpora and three anti-spoofing methods based on Gaussian mixture and convolutive neural network models. We assess the impacts of long-term spectral information, speaker population (through x-vector speaker embeddings), signal-to-noise ratio, and selected voice quality features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
