Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation
Ming Cheng, Yuke Lin, Ming Li

TL;DR
This paper introduces a novel sequence-to-sequence neural diarization framework capable of online and offline speaker diarization, addressing speaker detection and representation without prior enrollment, and demonstrating high accuracy in experiments.
Contribution
It presents a new diarization paradigm that jointly learns speaker embeddings within the network and handles unknown speakers without prior enrollment.
Findings
Achieves high diarization accuracy in experiments.
Handles unknown speakers without prior enrollment.
Operates effectively in online and offline modes.
Abstract
This paper proposes a novel Sequence-to-Sequence Neural Diarization (S2SND) framework to perform online and offline speaker diarization. It is developed from the sequence-to-sequence architecture of our previous target-speaker voice activity detection system and then evolves into a new diarization paradigm by addressing two critical problems. 1) Speaker Detection: The proposed approach can utilize partially given speaker embeddings to discover the unknown speaker and predict the target voice activities in the audio signal. It does not require a prior diarization system for speaker enrollment in advance. 2) Speaker Representation: The proposed approach can adopt the predicted voice activities as reference information to extract speaker embeddings from the audio signal simultaneously. The representation space of speaker embedding is jointly learned within the whole diarization network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsADaptive gradient method with the OPTimal convergence rate
