Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure

Yikang Wang; Hiromitsu Nishizaki; Ming Li

arXiv:2307.01546·cs.SD·October 31, 2023·1 cites

Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure

Yikang Wang, Hiromitsu Nishizaki, Ming Li

PDF

Open Access

TL;DR

This paper introduces a transfer learning approach using a Conformer model pre-trained on ASR or ASV tasks to improve anti-spoofing speech detection, achieving state-of-the-art results on multiple datasets.

Contribution

It proposes a novel MFA-Conformer transfer learning framework for anti-spoofing, enhancing robustness by leveraging pre-training on different speech tasks.

Findings

01

Achieves 0.04% EER on Chinese spoofing detection dataset

02

Outperforms baseline models significantly

03

Comparable to Wav2Vec 2.0 pre-training methods

Abstract

Finding synthetic artifacts of spoofing data will help the anti-spoofing countermeasures (CMs) system discriminate between spoofed and real speech. The Conformer combines the best of convolutional neural network and the Transformer, allowing it to aggregate global and local information. This may benefit the CM system to capture the synthetic artifacts hidden both locally and globally. In this paper, we present the transfer learning based MFA-Conformer structure for CM systems. By pre-training the Conformer encoder with different tasks, the robustness of the CM system is enhanced. The proposed method is evaluated on both Chinese and English spoofing detection databases. In the FAD clean set, proposed method achieves an EER of 0.04%, which dramatically outperforms the baseline. Our system is also comparable to the pre-training methods base on Wav2Vec 2.0. Moreover, we also provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Infant Health and Development