Can large-scale vocoded spoofed data improve speech spoofing   countermeasure with a self-supervised front end?

Xin Wang; Junichi Yamagishi

arXiv:2309.06014·eess.AS·December 29, 2023·2 cites

Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?

Xin Wang, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper explores how large-scale vocoded spoofed data, generated by neural vocoders, can enhance speech spoofing countermeasures using self-supervised learning, leading to improved detection performance on unseen datasets.

Contribution

It demonstrates that extensive vocoded data and SSL model distillation significantly improve spoofing detection accuracy on challenging unseen test sets.

Findings

01

SSL features trained on vocoded data improve CM performance

02

Distilled SSL outperforms previous models on multiple test sets

03

Large-scale vocoded data enhances generalization to unseen spoofing attacks

Abstract

A speech spoofing countermeasure (CM) that discriminates between unseen spoofed and bona fide data requires diverse training data. While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data. Since many neural vocoders are fast in building and generation, this study used multiple neural vocoders and created more than 9,000 hours of vocoded data on the basis of the VoxCeleb2 corpus. This study investigates how this large-scale vocoded data can improve spoofing countermeasures that use data-hungry self-supervised learning (SSL) models. Experiments demonstrated that the overall CM performance on multiple test sets improved when using features extracted by an SSL model continually trained on the vocoded data. Further improvement was observed when using a new SSL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/project-NN-Pytorch-scripts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research