Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang, Hiromitsu Nishizaki, Ming Li

TL;DR
This paper introduces a transfer learning approach using a Conformer model pre-trained on ASR or ASV tasks to improve anti-spoofing speech detection, achieving state-of-the-art results on multiple datasets.
Contribution
It proposes a novel MFA-Conformer transfer learning framework for anti-spoofing, enhancing robustness by leveraging pre-training on different speech tasks.
Findings
Achieves 0.04% EER on Chinese spoofing detection dataset
Outperforms baseline models significantly
Comparable to Wav2Vec 2.0 pre-training methods
Abstract
Finding synthetic artifacts of spoofing data will help the anti-spoofing countermeasures (CMs) system discriminate between spoofed and real speech. The Conformer combines the best of convolutional neural network and the Transformer, allowing it to aggregate global and local information. This may benefit the CM system to capture the synthetic artifacts hidden both locally and globally. In this paper, we present the transfer learning based MFA-Conformer structure for CM systems. By pre-training the Conformer encoder with different tasks, the robustness of the CM system is enhanced. The proposed method is evaluated on both Chinese and English spoofing detection databases. In the FAD clean set, proposed method achieves an EER of 0.04%, which dramatically outperforms the baseline. Our system is also comparable to the pre-training methods base on Wav2Vec 2.0. Moreover, we also provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Infant Health and Development
