Frequency and temporal convolutional attention for text-independent   speaker recognition

Sarthak Yadav; Atul Rai

arXiv:1910.07364·cs.SD·October 22, 2019

Frequency and temporal convolutional attention for text-independent speaker recognition

Sarthak Yadav, Atul Rai

PDF

TL;DR

This paper introduces convolutional attention modules for CNN-based speaker recognition, modeling frequency and temporal information separately, leading to state-of-the-art results on VoxCeleb with improved robustness in real-world conditions.

Contribution

It proposes convolutional attention methods for frequency and temporal modeling in CNNs, enhancing speaker recognition performance over existing baselines.

Findings

01

Achieves 2.031% EER on VoxCeleb1, setting new state-of-the-art.

02

Convolutional attention modules outperform no-attention and spatial-CBAM baselines.

03

Simultaneous modeling of frequency and temporal attention improves real-world robustness.

Abstract

Majority of the recent approaches for text-independent speaker recognition apply attention or similar techniques for aggregation of frame-level feature descriptors generated by a deep neural network (DNN) front-end. In this paper, we propose methods of convolutional attention for independently modelling temporal and frequency information in a convolutional neural network (CNN) based front-end. Our system utilizes convolutional block attention modules (CBAMs) [1] appropriately modified to accommodate spectrogram inputs. The proposed CNN front-end fitted with the proposed convolutional attention modules outperform the no-attention and spatial-CBAM baselines by a significant margin on the VoxCeleb [2, 3] speaker verification benchmark, and our best model achieves an equal error rate of 2:031% on the VoxCeleb1 test set, improving the existing state of the art result by a significant margin.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest