Dual-Strategy-Enhanced ConBiMamba for Neural Speaker Diarization
Zhen Liao, Gaole Dai, Mengqiao Chen, Wenqing Cheng, Wei Xu

TL;DR
This paper introduces a dual-strategy neural speaker diarization system, combining Conformer and Mamba architectures with novel loss and feature aggregation techniques, achieving state-of-the-art results on multiple datasets.
Contribution
It proposes the ConBiMamba model with boundary-enhanced loss and layer-wise feature aggregation, improving local detail modeling and long-range dependency handling in speaker diarization.
Findings
Achieves state-of-the-art performance on four datasets
Effectively handles long audio sequences with ExtBiMamba
Improves speaker change point detection accuracy
Abstract
Conformer and Mamba have achieved strong performance in speech modeling but face limitations in speaker diarization. Mamba is efficient but struggles with local details and nonlinear patterns. Conformer's self-attention incurs high memory overhead for long speech sequences and may cause instability in long-range dependency modeling. These limitations are critical for diarization, which requires both precise modeling of local variations and robust speaker consistency over extended spans. To address these challenges, we first apply ConBiMamba for speaker diarization. We follow the Pyannote pipeline and propose the Dual-Strategy-Enhanced ConBiMamba neural speaker diarization system. ConBiMamba integrates the strengths of Conformer and Mamba, where Conformer's convolutional and feed-forward structures are utilized to improve local feature extraction. By replacing Conformer's self-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Face recognition and analysis
