An Initial Investigation for Detecting Partially Spoofed Audio
Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino,, Nicholas Evans

TL;DR
This paper introduces a new database for partially-spoofed audio and demonstrates that existing countermeasures struggle with such data, highlighting the need for training on partially-spoofed examples.
Contribution
The paper presents PartialSpoof, a novel database for partially-spoofed speech, and evaluates how current detection methods perform on both utterance- and segment-level labels.
Findings
Countermeasures trained on fully-spoofed data perform poorly on partially-spoofed data.
Training on partially-spoofed data improves detection of both fully- and partially-spoofed utterances.
Detecting spoofed segments within an utterance remains a challenging task.
Abstract
All existing databases of spoofed speech contain attack data that is spoofed in its entirety. In practice, it is entirely plausible that successful attacks can be mounted with utterances that are only partially spoofed. By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances. This hypothesis raises the obvious question: 'Can we detect partially-spoofed audio?' This paper introduces a new database of partially-spoofed data, named PartialSpoof, to help address this question. This new database enables us to investigate and compare the performance of countermeasures on both utterance- and segmental- level labels. Experimental results using the utterance-level labels reveal that the reliability of countermeasures trained to detect fully-spoofed data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
