Attention and DCT based Global Context Modeling for Text-independent Speaker Recognition
Wei Xia, John H.L. Hansen

TL;DR
This paper introduces a novel global context modeling approach combining attention mechanisms and DCT techniques to enhance speaker recognition by capturing long-range dependencies and improving feature representation.
Contribution
It proposes a comprehensive global time-frequency context modeling block integrating attention and DCT-based methods for more robust speaker verification.
Findings
Significant performance improvement over standard ResNet and Squeeze & Excitation models.
Effective global context representation enhances speaker verification accuracy.
Multi-DCT attention mechanism boosts modeling capacity.
Abstract
Learning an effective speaker representation is crucial for achieving reliable performance in speaker verification tasks. Speech signals are high-dimensional, long, and variable-length sequences containing diverse information at each time-frequency (TF) location. The standard convolutional layer that operates on neighboring local regions often fails to capture the complex TF global information. Our motivation is to alleviate these challenges by increasing the modeling capacity, emphasizing significant information, and suppressing possible redundancies. We aim to design a more robust and efficient speaker recognition system by incorporating the benefits of attention mechanisms and Discrete Cosine Transform (DCT) based signal processing techniques, to effectively represent the global information in speech signals. To achieve this, we propose a general global time-frequency context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Residual Connection · Residual Block · Bottleneck Residual Block · Average Pooling · Global Average Pooling · Max Pooling · Kaiming Initialization
