ConvNeXt Based Neural Network for Audio Anti-Spoofing

Qiaowei Ma; Jinghui Zhong; Yitao Yang; Weiheng Liu; Ying Gao; Wing; W.Y. Ng

arXiv:2209.06434·cs.SD·December 23, 2022·1 cites

ConvNeXt Based Neural Network for Audio Anti-Spoofing

Qiaowei Ma, Jinghui Zhong, Yitao Yang, Weiheng Liu, Ying Gao, Wing, W.Y. Ng

PDF

Open Access 2 Repos

TL;DR

This paper introduces a lightweight end-to-end neural network based on ConvNeXt architecture, enhanced with channel attention and focal loss, to improve audio anti-spoofing accuracy in speaker verification systems.

Contribution

It adapts ConvNeXt for audio anti-spoofing, integrating channel attention and focal loss to focus on informative speech sub-bands and hard samples, outperforming existing methods.

Findings

01

Achieves EER of 0.64% on ASVSpoof 2019 LA dataset

02

Outperforms state-of-the-art anti-spoofing systems

03

Demonstrates effectiveness of ConvNeXt in audio anti-spoofing

Abstract

With the rapid development of speech conversion and speech synthesis algorithms, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. In recent years, researchers had proposed a number of anti-spoofing methods based on hand-crafted features. However, using hand-crafted features rather than raw waveform will lose implicit information for anti-spoofing. Inspired by the promising performance of ConvNeXt in image classification tasks, we revise the ConvNeXt network architecture and propose a lightweight end-to-end anti-spoofing model. By integrating with the channel attention block and using the focal loss function, the proposed model can focus on the most informative sub-bands of speech representations and the difficult samples that are hard to classify. Experiments show that our proposed system could achieve an equal error rate of 0.64% and min-tDCF of 0.0187…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders

MethodsConvNeXt · Focal Loss