A Study On Data Augmentation In Voice Anti-Spoofing
Ariel Cohen, Inbal Rimon, Eran Aflalo, and Haim Permuter

TL;DR
This paper investigates data augmentation techniques to enhance the detection of synthetic or spoofed audio, introducing novel methods like SpecAverage and a new spectrogram feature design, achieving state-of-the-art results in anti-spoofing challenges.
Contribution
It presents new data augmentation strategies, including compression, channel augmentation, and SpecAverage, along with a novel spectrogram feature design, significantly improving anti-spoofing system performance.
Findings
State-of-the-art EER of 15.46% in Deep Fake detection
50% reduction in baseline EER for Logical Access
Improved generalization through SpecAverage augmentation
Abstract
In this paper, we perform an in-depth study of how data augmentation techniques improve synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel variability, different audio compressions, different band-widths, and unseen spoofing attacks, which have all been shown to significantly degrade the performance of audio-based systems and Anti-Spoofing systems. Our results are based on the ASVspoof 2021 challenge, in the Logical Access (LA) and Deep Fake (DF) categories. Our study is Data-Centric, meaning that the models are fixed and we significantly improve the results by making changes in the data. We introduce two forms of data augmentation - compression augmentation for the DF part, compression & channel augmentation for the LA part. In addition, a new type of online data augmentation, SpecAverage, is introduced in which the audio features are masked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
