Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof   2017 benchmark

Bhusan Chettri; Emmanouil Benetos; Bob L. T. Sturm

arXiv:2010.07913·eess.AS·October 16, 2020

Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark

Bhusan Chettri, Emmanouil Benetos, Bob L. T. Sturm

PDF

1 Repo

TL;DR

This paper investigates dataset artefacts in the ASVspoof 2017 benchmark, revealing how they influence the perceived success of anti-spoofing systems and proposing methods to improve robustness and establish new baselines.

Contribution

It identifies dataset artefacts affecting anti-spoofing system performance and introduces a preprocessing approach to mitigate their impact, along with new benchmark results.

Findings

01

Artefacts can artificially inflate system success rates.

02

Discarding nonspeech segments reduces artefact exploitation.

03

New baseline results for frame-level and utterance-level models.

Abstract

The Automatic Speaker Verification Spoofing and Countermeasures Challenges motivate research in protecting speech biometric systems against a variety of different access attacks. The 2017 edition focused on replay spoofing attacks, and involved participants building and training systems on a provided dataset (ASVspoof 2017). More than 60 research papers have so far been published with this dataset, but none have sought to answer why countermeasures appear successful in detecting spoofing attacks. This article shows how artefacts inherent to the dataset may be contributing to the apparent success of published systems. We first inspect the ASVspoof 2017 dataset and summarize various artefacts present in the dataset. Second, we demonstrate how countermeasure models can exploit these artefacts to appear successful in this dataset. Third, for reliable and robust performance estimates on this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BhusanChettri/TASLP-study-on-dataset-artefact
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.