MASV: Speaker Verification with Global and Local Context Mamba
Yang Liu, Li Wan, Yiteng Huang, Ming Sun, Yangyang Shi, Florian Metze

TL;DR
The paper introduces MASV, a novel speech verification model that combines local and global context modeling within an efficient architecture, outperforming existing methods in accuracy and computational efficiency.
Contribution
MASV integrates the Mamba module into ECAPA-TDNN, effectively capturing global and local context for improved speech verification performance.
Findings
Surpasses existing models in verification accuracy
Achieves better efficiency compared to transformer-based approaches
Effectively models long-sequence audio with Mamba modules
Abstract
Deep learning models like Convolutional Neural Networks and transformers have shown impressive capabilities in speech verification, gaining considerable attention in the research community. However, CNN-based approaches struggle with modeling long-sequence audio effectively, resulting in suboptimal verification performance. On the other hand, transformer-based methods are often hindered by high computational demands, limiting their practicality. This paper presents the MASV model, a novel architecture that integrates the Mamba module into the ECAPA-TDNN framework. By introducing the Local Context Bidirectional Mamba and Tri-Mamba block, the model effectively captures both global and local context within audio sequences. Experimental results demonstrate that the MASV model substantially enhances verification performance, surpassing existing models in both accuracy and efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
