TL;DR
This paper introduces an encoder-decoder attractor module for end-to-end neural diarization, enabling flexible speaker counts and improved overlap handling, outperforming traditional cascaded methods.
Contribution
The paper proposes EEND-EDA, a novel method that generates speaker attractors with an encoder-decoder, allowing for variable speaker numbers and better overlap management in neural diarization.
Findings
Outperforms conventional cascaded diarization methods
Handles speaker overlaps effectively
Supports flexible number of speakers
Abstract
This paper investigates an end-to-end neural diarization (EEND) method for an unknown number of speakers. In contrast to the conventional cascaded approach to speaker diarization, EEND methods are better in terms of speaker overlap handling. However, EEND still has a disadvantage in that it cannot deal with a flexible number of speakers. To remedy this problem, we introduce encoder-decoder-based attractor calculation module (EDA) to EEND. Once frame-wise embeddings are obtained, EDA sequentially generates speaker-wise attractors on the basis of a sequence-to-sequence method using an LSTM encoder-decoder. The attractor generation continues until a stopping condition is satisfied; thus, the number of attractors can be flexible. Diarization results are then estimated as dot products of the attractors and embeddings. The embeddings from speaker overlaps result in larger dot product values…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsEnd-to-End Neural Diarization · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
