Using Multi-Resolution Feature Maps with Convolutional Neural Networks   for Anti-Spoofing in ASV

Qiongqiong Wang; Kong Aik Lee; Takafumi Koshinaka

arXiv:2008.08865·eess.AS·August 21, 2020

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka

PDF

Open Access

TL;DR

This paper introduces a multi-resolution spectrogram approach using CNNs for anti-spoofing in speaker verification, enhancing discriminative features while maintaining low computational costs.

Contribution

It proposes stacking spectrograms with different window lengths as multi-resolution inputs for CNNs, improving anti-spoofing performance over single-resolution methods.

Findings

01

Outperforms score fusion methods on ASVspoof 2019 dataset

02

Improves frequency and time resolution in feature maps

03

Maintains low computational costs

Abstract

This paper presents a simple but effective method that uses multi-resolution feature maps with convolutional neural networks (CNNs) for anti-spoofing in automatic speaker verification (ASV). The central idea is to alleviate the problem that the feature maps commonly used in anti-spoofing networks are insufficient for building discriminative representations of audio segments, as they are often extracted by a single-length sliding window. Resulting trade-offs between time and frequency resolutions restrict the information in single spectrograms. The proposed method improves both frequency resolution and time resolution by stacking multiple spectrograms that are extracted using different window lengths. These are fed into a convolutional neural network in the form of multiple channels, making it possible to extract more information from input signals while only marginally increasing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing