Advancing the dimensionality reduction of speaker embeddings for speaker   diarisation: disentangling noise and informing speech activity

You Jin Kim; Hee-Soo Heo; Jee-weon Jung; Youngki Kwon; Bong-Jin Lee,; Joon Son Chung

arXiv:2110.03380·cs.SD·November 4, 2022·1 cites

Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity

You Jin Kim, Hee-Soo Heo, Jee-weon Jung, Youngki Kwon, Bong-Jin Lee,, Joon Son Chung

PDF

Open Access

TL;DR

This paper introduces a new framework for reducing the dimensionality of speaker embeddings that effectively disentangles noise from speaker information, improving diarisation accuracy without system fusion.

Contribution

It proposes a novel disentanglement framework and utilizes speech activity vectors to enhance noise robustness in speaker embeddings.

Findings

01

Achieves state-of-the-art performance on four datasets

02

Effectively separates noise from speaker information

03

Improves diarisation accuracy without system fusion

Abstract

The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as noise, adversely affecting performance. Our previous work has proposed an auto-encoder-based dimensionality reduction module to help remove the redundant information. However, they do not explicitly separate such information and have also been found to be sensitive to hyper-parameter values. To this end, we propose two contributions to overcome these issues: (i) a novel dimensionality reduction framework that can disentangle spurious information from the speaker embeddings; (ii) the use of speech activity vector to prevent the speaker code from representing the background noise. Through a range of experiments conducted on four datasets, our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing