Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?
Nicolas M. M\"uller, Franziska Dieckmann, Pavel Czempin, Roman Canals,, Konstantin B\"ottinger, Jennifer Williams

TL;DR
This paper analyzes a dataset artifact in ASVspoof challenges where silence duration correlates with spoofing labels, revealing models may rely on silence duration rather than genuine speech features, affecting spoof detection reliability.
Contribution
The study uncovers the influence of silence duration artifacts in the dataset and demonstrates how models trained on silence features can achieve high accuracy, highlighting potential biases in spoof detection models.
Findings
Models trained solely on silence duration achieve up to 85% accuracy.
Silence trimming during preprocessing significantly worsens model performance.
Silence duration correlates with spoofing labels, affecting system score interpretation.
Abstract
We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenomenon and its impact in depth. We compare several types of models trained on a) only the duration of the leading silence and b) only on the duration of leading and trailing silence. Results show that models trained on only the duration of the leading silence perform particularly well, and achieve up to 85% percent accuracy and an equal error rate (EER) of 15.1%. At the same time, we observe that trimming silence during pre-processing and then training established antispoofing models using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
