DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter   for Speaker Verification

Yangfu Li; Jiapan Gan; Xiaodan Lin

arXiv:2303.11020·cs.SD·August 2, 2023·1 cites

DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification

Yangfu Li, Jiapan Gan, Xiaodan Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces DS-TDNN, a dual-stream neural network with a global-aware filter layer that captures long-range dependencies, significantly improving speaker verification accuracy especially on longer utterances while reducing computational costs.

Contribution

The paper proposes a novel GF layer and a dual-stream TDNN architecture that effectively model global context and local features simultaneously for speaker verification.

Findings

01

Achieves 10% relative improvement over ECAPA-TDNN in speaker verification.

02

Reduces computational cost by 20% compared to ECAPA-TDNN.

03

Outperforms residual and attention-based models on variable-length utterances.

Abstract

Conventional time-delay neural networks (TDNNs) struggle to handle long-range context, their ability to represent speaker information is therefore limited in long utterances. Existing solutions either depend on increasing model complexity or try to balance between local features and global context to address this issue. To effectively leverage the long-term dependencies of audio signals and constrain model complexity, we introduce a novel module called Global-aware Filter layer (GF layer) in this work, which employs a set of learnable transform-domain filters between a 1D discrete Fourier transform and its inverse transform to capture global context. Additionally, we develop a dynamic filtering strategy and a sparse regularization method to enhance the performance of the GF layer and prevent overfitting. Based on the GF layer, we present a dual-stream TDNN architecture called DS-TDNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ychenl/ds-tdnn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing