Heterogeneous Target Speech Separation

Efthymios Tzinis; Gordon Wichern; Aswin Subramanian; Paris Smaragdis,; Jonathan Le Roux

arXiv:2204.03594·cs.SD·November 14, 2022·1 cites

Heterogeneous Target Speech Separation

Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis,, Jonathan Le Roux

PDF

Open Access

TL;DR

This paper presents a heterogeneous target speech separation framework that leverages diverse datasets and concepts, improving generalization and robustness in single-channel source separation tasks.

Contribution

It introduces a novel heterogeneous separation paradigm that utilizes cross-domain concepts and datasets, enhancing generalization and robustness over traditional single-domain models.

Findings

01

Models trained with heterogeneous conditions outperform single-domain models.

02

The approach improves generalization to unseen concepts and outperforms permutation invariant training.

03

The method enhances robustness in challenging separation scenarios.

Abstract

We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing