ConvNeXt Based Neural Network for Audio Anti-Spoofing
Qiaowei Ma, Jinghui Zhong, Yitao Yang, Weiheng Liu, Ying Gao, Wing, W.Y. Ng

TL;DR
This paper introduces a lightweight end-to-end neural network based on ConvNeXt architecture, enhanced with channel attention and focal loss, to improve audio anti-spoofing accuracy in speaker verification systems.
Contribution
It adapts ConvNeXt for audio anti-spoofing, integrating channel attention and focal loss to focus on informative speech sub-bands and hard samples, outperforming existing methods.
Findings
Achieves EER of 0.64% on ASVSpoof 2019 LA dataset
Outperforms state-of-the-art anti-spoofing systems
Demonstrates effectiveness of ConvNeXt in audio anti-spoofing
Abstract
With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the promising performance of ConvNeXt in image classification tasks, we revise the ConvNeXt network architecture and propose a lightweight end-to-end anti-spoofing model. By integrating with the channel attention block and using the focal loss function, the proposed model can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify. Experiments show that our proposed system could achieve an equal error rate of 0.64% and min-tDCF of 0.0187…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
MethodsConvNeXt · Focal Loss
