Speaker Overlap-aware Neural Diarization for Multi-party Meeting   Analysis

Zhihao Du; Shiliang Zhang; Siqi Zheng; Zhijie Yan

arXiv:2211.10243·cs.SD·November 21, 2022·1 cites

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural diarization approach that explicitly models speaker overlaps using power set encoding, significantly improving multi-party meeting analysis accuracy.

Contribution

It reformulates overlapped speaker diarization as a single-label prediction problem and proposes the SOND model to better handle speaker overlaps and dependencies.

Findings

01

Outperforms state-of-the-art target speaker voice activity detection methods.

02

Achieves 6.30% relative reduction in diarization error.

03

Effectively models speaker overlaps and dependencies.

Abstract

Recently, hybrid systems of clustering and neural diarization models have been successfully applied in multi-party meeting analysis. However, current models always treat overlapped speaker diarization as a multi-label classification problem, where speaker dependency and overlaps are not well considered. To overcome the disadvantages, we reformulate overlapped speaker diarization task as a single-label prediction problem via the proposed power set encoding (PSE). Through this formulation, speaker dependency and overlaps can be explicitly modeled. To fully leverage this formulation, we further propose the speaker overlap-aware neural diarization (SOND) model, which consists of a context-independent (CI) scorer to model global speaker discriminability, a context-dependent scorer (CD) to model local discriminability, and a speaker combining network (SCN) to combine and reassign speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba-damo-academy/FunASR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems