Exploring Speaker Diarization with Mixture of Experts
Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Hang Chen, Jun Du

TL;DR
This paper introduces a novel neural speaker diarization system that combines memory-aware embeddings, sequence-to-sequence architecture, and a mixture of experts to improve robustness and accuracy in complex acoustic environments.
Contribution
It presents a new neural diarization framework integrating memory-aware embeddings with a mixture of experts, achieving state-of-the-art results on multiple challenging datasets.
Findings
Enhanced robustness and generalization in speaker diarization.
State-of-the-art performance on CHiME-6, DiPCo, Mixer 6, and DIHARD-III datasets.
Effective mitigation of model bias through SS-MoE.
Abstract
In this paper, we propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates a memory-aware multi-speaker embedding module with a sequence-to-sequence architecture. The system leverages a memory module to enhance speaker embeddings and employs a Seq2Seq framework to efficiently map acoustic features to speaker labels. Additionally, we explore the application of mixture of experts in speaker diarization, and introduce a Shared and Soft Mixture of Experts (SS-MoE) module to further mitigate model bias and enhance performance. Incorporating SS-MoE leads to the extended model NSD-MS2S-SSMoE. Experiments on multiple complex acoustic datasets, including CHiME-6, DiPCo, Mixer 6 and DIHARD-III evaluation sets, demonstrate meaningful improvements in robustness and generalization. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsLong Short-Term Memory · Sequence to Sequence
