Audio compression-assisted feature extraction for voice replay attack detection
Xiangyu Shi, Yuhao Luo, Li Wang, Haorui He, Hao Li, Lei Wang, Zhizheng, Wu

TL;DR
This paper introduces a novel audio compression-based feature extraction method to improve the detection of voice replay attacks, effectively capturing channel noise information to distinguish genuine speech from spoofing attempts.
Contribution
The study proposes a new feature extraction technique using audio compression to enhance replay attack detection, achieving state-of-the-art results on the ASVspoof 2021 dataset.
Findings
Achieved lowest EER of 22.71% on ASVspoof 2021 PA evaluation set.
Demonstrated robustness of the proposed features with data augmentation.
Confirmed effectiveness across multiple classifiers.
Abstract
Replay attack is one of the most effective and simplest voice spoofing attacks. Detecting replay attacks is challenging, according to the Automatic Speaker Verification Spoofing and Countermeasures Challenge 2021 (ASVspoof 2021), because they involve a loudspeaker, a microphone, and acoustic conditions (e.g., background noise). One obstacle to detecting replay attacks is finding robust feature representations that reflect the channel noise information added to the replayed speech. This study proposes a feature extraction approach that uses audio compression for assistance. Audio compression compresses audio to preserve content and speaker information for transmission. The missed information after decompression is expected to contain content- and speaker-independent information (e.g., channel noise added during the replay process). We conducted a comprehensive experiment with a few data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
