Surrogate Source Model Learning for Determined Source Separation

Robin Scheibler; Masahito Togami

arXiv:2011.05540·eess.AS·November 12, 2020

Surrogate Source Model Learning for Determined Source Separation

Robin Scheibler, Masahito Togami

PDF

Open Access

TL;DR

This paper introduces a method to learn surrogate functions for speech priors in blind source separation, improving separation quality and generalization across different speaker mixtures and AuxIVA variants.

Contribution

It proposes a novel approach to approximate surrogate functions directly, enabling the use of deep speech priors with AuxIVA without complex derivations.

Findings

01

Significant SDR improvement over baseline methods.

02

Learnt surrogate generalizes to more speakers and different AuxIVA updates.

03

Lower WER achieved, with up to 36% reduction.

Abstract

We propose to learn surrogate functions of universal speech priors for determined blind speech separation. Deep speech priors are highly desirable due to their high modelling power, but are not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required surrogate function is not easy, nor always possible. Instead, we do away with exact majorization and directly approximate the surrogate. Taking advantage of iterative source steering (ISS) updates, we back propagate the permutation invariant separation loss through multiple iterations of AuxIVA. ISS lends itself well to this task due to its lower complexity and lack of matrix inversion. Experiments show large improvements in terms of scale invariant signal-to-distortion (SDR) ratio and word error rate compared to baseline methods. Training is done on two speakers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Speech Recognition and Synthesis