Neural Diarization with Non-autoregressive Intermediate Attractors
Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji, Ogawa

TL;DR
This paper introduces a novel non-autoregressive neural diarization model that uses intermediate attractors to improve speaker label dependency modeling, leading to enhanced performance and training efficiency.
Contribution
It proposes a new non-autoregressive EEND model with intermediate attractors that refine speaker labels, addressing the lack of label dependency in previous models.
Findings
Improved diarization performance on CALLHOME dataset.
Deeper networks benefit more from intermediate labels.
Enhanced training throughput compared to EEND-EDA.
Abstract
End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency between frames. The proposed method generates non-autoregressive intermediate attractors to produce speaker labels at the lower layers and conditions the subsequent layers with these labels. While the proposed model works in a non-autoregressive manner, the speaker labels are refined by referring to the whole sequence of intermediate labels. The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsEnd-to-End Neural Diarization
