Mutual Learning of Single- and Multi-Channel End-to-End Neural   Diarization

Shota Horiguchi; Yuki Takashima; Shinji Watanabe; Paola Garcia

arXiv:2210.03459·eess.AS·October 10, 2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia

PDF

Open Access

TL;DR

This paper introduces a bi-directional knowledge transfer method between single- and multi-channel neural diarization models, leading to mutual performance improvements in speaker diarization tasks.

Contribution

It proposes an end-to-end neural diarization framework that alternates between knowledge distillation and finetuning for both single- and multi-channel inputs, enhancing performance.

Findings

01

Mutual improvements in diarization accuracy for both single- and multi-channel models.

02

Effective bi-directional knowledge transfer enhances model performance.

03

Experimental validation on two-speaker data confirms the method's benefits.

Abstract

Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately. We first introduce an end-to-end neural diarization model that can handle both single- and multi-channel inputs. Using this model, we alternately conduct i) knowledge distillation from a multi-channel model to a single-channel model and ii) finetuning from the distilled single-channel model to a multi-channel model. Experimental results on two-speaker data show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsKnowledge Distillation