Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning
Rahul T P, P R Aravind, Ranjith C, Usamath Nechiyil, Nandakumar, Paramparambath

TL;DR
This paper presents a deep convolutional neural network approach using transfer learning to detect spoofing attacks in automatic speaker verification systems, achieving low error rates on multiple datasets.
Contribution
The study introduces a ResNet-34 based deep learning model utilizing Mel-spectrograms for effective spoofing detection in speaker verification systems.
Findings
Achieved EER of 0.9056% on development set for logical access
Achieved EER of 5.32% on evaluation set for logical access
Achieved EER of 5.87% on development set for physical access
Abstract
Automatic Speaker Verification systems are gaining popularity these days; spoofing attacks are of prime concern as they make these systems vulnerable. Some spoofing attacks like Replay attacks are easier to implement but are very hard to detect thus creating the need for suitable countermeasures. In this paper, we propose a speech classifier based on deep-convolutional neural network to detect spoofing attacks. Our proposed methodology uses acoustic time-frequency representation of power spectral densities on Mel frequency scale (Mel-spectrogram), via deep residual learning (an adaptation of ResNet-34 architecture). Using a single model system, we have achieved an equal error rate (EER) of 0.9056% on the development and 5.32% on the evaluation dataset of logical access scenario and an equal error rate (EER) of 5.87% on the development and 5.74% on the evaluation dataset of physical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Batch Normalization · 1x1 Convolution · Average Pooling · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block · Kaiming Initialization
