SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Yue Li; Xinsheng Wang; Li Zhang; Lei Xie

arXiv:2406.08393·eess.AS·June 13, 2024

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Yue Li, Xinsheng Wang, Li Zhang, Lei Xie

PDF

Open Access

TL;DR

This paper investigates the use of self-supervised learning features for speaker change detection, proposing SCDNet and a contrastive learning method, demonstrating WavLm's superiority and effective model design.

Contribution

It introduces SCDNet, explores SSL model layers with a learnable weighting, and proposes contrastive learning to improve speaker change detection performance.

Findings

01

WavLm outperforms other SSL models in SCD.

02

SCDNet effectively leverages SSL features for SCD.

03

Contrastive learning reduces overfitting in SCD models.

Abstract

Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav2vec 2.0, and WavLm are investigated. To discern the most potent layer of SSL models for SCD, a learnable weighting method is employed to analyze the effectiveness of intermediate representations. Additionally, a fine-tuning-based approach is also implemented to further compare the characteristics of SSL models in the SCD task. Furthermore, a contrastive learning method is proposed to mitigate the overfitting tendencies in the training of both the fine-tuning-based method and SCDNet.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsContrastive Learning