Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks
Awais Khan, Khalid Mahmood Malik

TL;DR
This paper introduces a unified neural network architecture that effectively detects both logical and physical voice spoofing attacks, reducing error disparities and enhancing security in voice biometric systems.
Contribution
The paper proposes a novel Parallel Stacked Aggregation Network that processes raw audio for unified spoofing detection, addressing the gap between logical and physical attack detection methods.
Findings
Outperforms state-of-the-art solutions on ASVspoof-2019 and VSDC datasets.
Reduces disparities in Equal Error Rate between attack types.
Demonstrates superior generalizability and robustness in spoofing detection.
Abstract
Automatic Speaker Verification (ASV) systems are increasingly used in voice bio-metrics for user authentication but are susceptible to logical and physical spoofing attacks, posing security risks. Existing research mainly tackles logical or physical attacks separately, leading to a gap in unified spoofing detection. Moreover, when existing systems attempt to handle both types of attacks, they often exhibit significant disparities in the Equal Error Rate (EER). To bridge this gap, we present a Parallel Stacked Aggregation Network that processes raw audio. Our approach employs a split-transform-aggregation technique, dividing utterances into convolved representations, applying transformations, and aggregating the results to identify logical (LA) and physical (PA) spoofing attacks. Evaluation of the ASVspoof-2019 and VSDC datasets shows the effectiveness of the proposed system. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing
