Audio Spoofing Verification using Deep Convolutional Neural Networks by   Transfer Learning

Rahul T P; P R Aravind; Ranjith C; Usamath Nechiyil; Nandakumar; Paramparambath

arXiv:2008.03464·eess.AS·August 11, 2020·27 cites

Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning

Rahul T P, P R Aravind, Ranjith C, Usamath Nechiyil, Nandakumar, Paramparambath

PDF

Open Access 1 Repo

TL;DR

This paper presents a deep convolutional neural network approach using transfer learning to detect spoofing attacks in automatic speaker verification systems, achieving low error rates on multiple datasets.

Contribution

The study introduces a ResNet-34 based deep learning model utilizing Mel-spectrograms for effective spoofing detection in speaker verification systems.

Findings

01

Achieved EER of 0.9056% on development set for logical access

02

Achieved EER of 5.32% on evaluation set for logical access

03

Achieved EER of 5.87% on development set for physical access

Abstract

Automatic Speaker Verification systems are gaining popularity these days; spoofing attacks are of prime concern as they make these systems vulnerable. Some spoofing attacks like Replay attacks are easier to implement but are very hard to detect thus creating the need for suitable countermeasures. In this paper, we propose a speech classifier based on deep-convolutional neural network to detect spoofing attacks. Our proposed methodology uses acoustic time-frequency representation of power spectral densities on Mel frequency scale (Mel-spectrogram), via deep residual learning (an adaptation of ResNet-34 architecture). Using a single model system, we have achieved an equal error rate (EER) of 0.9056% on the development and 5.32% on the evaluation dataset of logical access scenario and an equal error rate (EER) of 5.87% on the development and 5.74% on the evaluation dataset of physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rahul-t-p/ASVspoof-2019
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Batch Normalization · 1x1 Convolution · Average Pooling · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block · Kaiming Initialization