Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization
Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola Garcia

TL;DR
This paper introduces a bi-directional knowledge transfer method between single- and multi-channel neural diarization models, leading to mutual performance improvements in speaker diarization tasks.
Contribution
It proposes an end-to-end neural diarization framework that alternates between knowledge distillation and finetuning for both single- and multi-channel inputs, enhancing performance.
Findings
Mutual improvements in diarization accuracy for both single- and multi-channel models.
Effective bi-directional knowledge transfer enhances model performance.
Experimental validation on two-speaker data confirms the method's benefits.
Abstract
Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately. We first introduce an end-to-end neural diarization model that can handle both single- and multi-channel inputs. Using this model, we alternately conduct i) knowledge distillation from a multi-channel model to a single-channel model and ii) finetuning from the distilled single-channel model to a multi-channel model. Experimental results on two-speaker data show that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
MethodsKnowledge Distillation
