An Initialization Scheme for Meeting Separation with Spatial Mixture   Models

Christoph Boeddeker; Tobias Cord-Landwehr; Thilo von Neumann and; Reinhold Haeb-Umbach

arXiv:2204.01338·cs.SD·April 5, 2022·1 cites

An Initialization Scheme for Meeting Separation with Spatial Mixture Models

Christoph Boeddeker, Tobias Cord-Landwehr, Thilo von Neumann and, Reinhold Haeb-Umbach

PDF

Open Access

TL;DR

This paper introduces a novel initialization method for spatial mixture models in meeting separation, leveraging the fact that only one speaker is active at a time, leading to improved speech recognition accuracy.

Contribution

The paper proposes a new initialization scheme for spatial mixture models that enhances meeting separation performance by utilizing temporal speaker activity patterns.

Findings

01

Achieves a WER of 5.9% on LibriCSS, comparable to the best results.

02

Significantly reduces WER compared to random initialization.

03

Provides a spatial diarization method based on the mixture model.

Abstract

Spatial mixture model (SMM) supported acoustic beamforming has been extensively used for the separation of simultaneously active speakers. However, it has hardly been considered for the separation of meeting data, that are characterized by long recordings and only partially overlapping speech. In this contribution, we show that the fact that often only a single speaker is active can be utilized for a clever initialization of an SMM that employs time-varying class priors. In experiments on LibriCSS we show that the proposed initialization scheme achieves a significantly lower Word Error Rate (WER) on a downstream speech recognition task than a random initialization of the class probabilities by drawing from a Dirichlet distribution. With the only requirement that the number of speakers has to be known, we obtain a WER of 5.9 %, which is comparable to the best reported WER on this data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing