Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
Xin Wang, Junichi Yamagishi

TL;DR
This paper proposes an efficient method to generate spoofed speech data for training countermeasures using neural vocoders and contrastive feature loss, improving detection performance against speech spoofing attacks.
Contribution
It introduces a novel approach to create spoofed training data with neural vocoders and a contrastive loss, reducing reliance on complex TTS and VC systems.
Findings
Neural vocoder-generated spoof data enhances training effectiveness.
Fine-tuning vocoders on target domain data improves results.
Contrastive feature loss boosts spoofing detection accuracy.
Abstract
A good training set for speech spoofing countermeasures requires diverse TTS and VC spoofing attacks, but generating TTS and VC spoofed trials for a target speaker may be technically demanding. Instead of using full-fledged TTS and VC systems, this study uses neural-network-based vocoders to do copy-synthesis on bona fide utterances. The output data can be used as spoofed data. To make better use of pairs of bona fide and spoofed data, this study introduces a contrastive feature loss that can be plugged into the standard training criterion. On the basis of the bona fide trials from the ASVspoof 2019 logical access training set, this study empirically compared a few training sets created in the proposed manner using a few neural non-autoregressive vocoders. Results on multiple test sets suggest good practices such as fine-tuning neural vocoders using bona fide data from the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
MethodsTest
