Mamba-based Segmentation Model for Speaker Diarization

Alexis Plaquet; Naohiro Tawara; Marc Delcroix; Shota Horiguchi,; Atsushi Ando; Shoko Araki

arXiv:2410.06459·cs.SD·October 11, 2024

Mamba-based Segmentation Model for Speaker Diarization

Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi,, Atsushi Ando, Shoko Araki

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mamba, a novel neural architecture with attention-like capabilities, which improves speaker diarization by enabling longer context processing and surpasses existing models in performance.

Contribution

The paper proposes Mamba, a new RNN-like architecture with attention features, demonstrating its effectiveness for speaker diarization and outperforming existing models.

Findings

01

Mamba enables longer local window processing for diarization.

02

Mamba-based system achieves state-of-the-art results on multiple datasets.

03

Mamba outperforms traditional RNN and attention-based models.

Abstract

Mamba is a newly proposed architecture which behaves like a recurrent neural network (RNN) with attention-like capabilities. These properties are promising for speaker diarization, as attention-based models have unsuitable memory requirements for long-form audio, and traditional RNN capabilities are too limited. In this paper, we propose to assess the potential of Mamba for diarization by comparing the state-of-the-art neural segmentation of the pyannote pipeline with our proposed Mamba-based variant. Mamba's stronger processing capabilities allow usage of longer local windows, which significantly improve diarization quality by making the speaker embedding extraction more reliable. We find Mamba to be a superior alternative to both traditional RNN and the tested attention-based model. Our proposed Mamba-based system achieves state-of-the-art performance on three widely used diarization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nttcslab-sp/mamba-diarization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing