Neural Speaker Diarization with Speaker-Wise Chain Rule

Yusuke Fujita; Shinji Watanabe; Shota Horiguchi; Yawen Xue; Jing Shi,; Kenji Nagamatsu

arXiv:2006.01796·eess.AS·June 3, 2020·41 cites

Neural Speaker Diarization with Speaker-Wise Chain Rule

Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi,, Kenji Nagamatsu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural speaker diarization method that uses a speaker-wise chain rule to handle a variable number of speakers, outperforming existing fixed-speaker models in accuracy.

Contribution

The paper proposes a speaker-wise conditional inference approach based on the probabilistic chain rule, enabling neural diarization to handle variable numbers of speakers effectively.

Findings

01

Outperforms state-of-the-art end-to-end diarization methods

02

Accurately handles a variable number of speakers

03

Reduces diarization error rate

Abstract

Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each speaker's speech activity is regarded as a single random variable, and is estimated sequentially conditioned on previously estimated other speakers' speech activities. Similar to other sequence-to-sequence models, the proposed method produces a variable number of speakers with a stop sequence condition. We evaluated the proposed method on multi-speaker audio recordings of a variable number of speakers. Experimental results show that the proposed method can correctly produce diarization results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hitachi-speech/EEND
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing