Improving End-to-End Neural Diarization Using Conversational Summary   Representations

Samuel J. Broughton; Lahiru Samarakoon

arXiv:2306.13863·cs.SD·June 27, 2023

Improving End-to-End Neural Diarization Using Conversational Summary Representations

Samuel J. Broughton, Lahiru Samarakoon

PDF

Open Access

TL;DR

This paper enhances end-to-end neural speaker diarization by replacing zero vector inputs with learned conversational summary representations, leading to improved diarization accuracy across multiple datasets.

Contribution

The study introduces learned conversational summary representations into EEND-EDA, improving speaker attractor generation and diarization performance.

Findings

01

Achieved 1.90% absolute DER improvement over baseline

02

Proposed three methods for initializing summary vectors

03

Investigated effects of varying input recording lengths

Abstract

Speaker diarization is a task concerned with partitioning an audio recording by speaker identity. End-to-end neural diarization with encoder-decoder based attractor calculation (EEND-EDA) aims to solve this problem by directly outputting diarization results for a flexible number of speakers. Currently, the EDA module responsible for generating speaker-wise attractors is conditioned on zero vectors providing no relevant information to the network. In this work, we extend EEND-EDA by replacing the input zero vectors to the decoder with learned conversational summary representations. The updated EDA module sequentially generates speaker-wise attractors based on utterance-level information. We propose three methods to initialize the summary vector and conduct an investigation into varying input recording lengths. On a range of publicly available test sets, our model achieves an absolute DER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing