Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis
Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

TL;DR
This paper introduces a novel neural diarization approach that explicitly models speaker overlaps using power set encoding, significantly improving multi-party meeting analysis accuracy.
Contribution
It reformulates overlapped speaker diarization as a single-label prediction problem and proposes the SOND model to better handle speaker overlaps and dependencies.
Findings
Outperforms state-of-the-art target speaker voice activity detection methods.
Achieves 6.30% relative reduction in diarization error.
Effectively models speaker overlaps and dependencies.
Abstract
Recently, hybrid systems of clustering and neural diarization models have been successfully applied in multi-party meeting analysis. However, current models always treat overlapped speaker diarization as a multi-label classification problem, where speaker dependency and overlaps are not well considered. To overcome the disadvantages, we reformulate overlapped speaker diarization task as a single-label prediction problem via the proposed power set encoding (PSE). Through this formulation, speaker dependency and overlaps can be explicitly modeled. To fully leverage this formulation, we further propose the speaker overlap-aware neural diarization (SOND) model, which consists of a context-independent (CI) scorer to model global speaker discriminability, a context-dependent scorer (CD) to model local discriminability, and a speaker combining network (SCN) to combine and reassign speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
