Enhanced Reverberation as Supervision for Unsupervised Speech Separation

Kohei Saijo; Gordon Wichern; Fran\c{c}ois G. Germain; Zexu Pan,; Jonathan Le Roux

arXiv:2408.03438·eess.AS·August 8, 2024

Enhanced Reverberation as Supervision for Unsupervised Speech Separation

Kohei Saijo, Gordon Wichern, Fran\c{c}ois G. Germain, Zexu Pan,, Jonathan Le Roux

PDF

Open Access 1 Repo

TL;DR

This paper introduces ERAS, an improved unsupervised speech separation method that achieves stable training and high performance in determined conditions by addressing permutation issues and leveraging novel loss functions.

Contribution

ERAS extends reverberation as supervision to determined cases, introducing new loss strategies for stable training and improved speech separation performance.

Findings

01

Stable training achieved in determined source-channel conditions.

02

Enhanced separation performance with novel loss terms.

03

High stability demonstrated in experimental results.

Abstract

Reverberation as supervision (RAS) is a framework that allows for training monaural speech separation models from multi-channel mixtures in an unsupervised manner. In RAS, models are trained so that sources predicted from a mixture at an input channel can be mapped to reconstruct a mixture at a target channel. However, stable unsupervised training has so far only been achieved in over-determined source-channel conditions, leaving the key determined case unsolved. This work proposes enhanced RAS (ERAS) for solving this problem. Through qualitative analysis, we found that stable training can be achieved by leveraging the loss term to alleviate the frequency-permutation problem. Separation performance is also boosted by adding a novel loss term where separated signals mapped back to their own input mixture are used as pseudo-targets for the signals separated from other channels and mapped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

merlresearch/reverberation-as-supervision
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing