Data Quality as Predictor of Voice Anti-Spoofing Generalization

Bhusan Chettri; Rosa Gonz\'alez Hautam\"aki; Md Sahidullah; Tomi; Kinnunen

arXiv:2103.14602·eess.AS·June 23, 2021

Data Quality as Predictor of Voice Anti-Spoofing Generalization

Bhusan Chettri, Rosa Gonz\'alez Hautam\"aki, Md Sahidullah, Tomi, Kinnunen

PDF

TL;DR

This paper investigates how data quality factors influence the ability of voice anti-spoofing systems to generalize across different datasets, using a new interpretative framework and multiple experiments.

Contribution

It introduces a novel framework for analyzing data quality's impact on anti-spoofing generalization and evaluates various data quality factors across multiple datasets and models.

Findings

01

Data quality significantly affects anti-spoofing performance.

02

Certain voice quality features improve cross-domain generalization.

03

Long-term spectral info and speaker embeddings impact detection accuracy.

Abstract

Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Many anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora) -- and we do not know \emph{why}. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance. Our within- and between-domain experiments pool data from seven public corpora and three anti-spoofing methods based on Gaussian mixture and convolutive neural network models. We assess the impacts of long-term spectral information, speaker population (through x-vector speaker embeddings), signal-to-noise ratio, and selected voice quality features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.