Independence-based Joint Dereverberation and Separation with Neural   Source Model

Kohei Saijo; Robin Scheibler

arXiv:2110.06545·eess.AS·April 4, 2022·Interspeech

Independence-based Joint Dereverberation and Separation with Neural Source Model

Kohei Saijo, Robin Scheibler

PDF

Open Access

TL;DR

This paper introduces an end-to-end neural network approach for joint dereverberation and source separation that is permutation-invariant and effective across varying numbers of speakers and microphones, improving speech quality and recognition accuracy.

Contribution

It extends independent vector analysis with a neural source model trained end-to-end, enabling robust joint dereverberation and separation regardless of speaker count.

Findings

01

Achieves high speech quality and low WER in experiments.

02

Effective on synthetic and real recorded datasets without modifications.

03

Handles varying numbers of speakers and microphones.

Abstract

We propose an independence-based joint dereverberation and separation method with a neural source model. We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation. The network is trained in an end-to-end manner with a permutation invariant loss on the time-domain separation output signals. Our proposed method can be applied in any situation with at least as many microphones as sources, regardless of their number. In experiments, we demonstrate that our method results in high performance in terms of both speech quality metrics and word error rate (WER), even for mixtures with a different number of speakers than training. Furthermore, the model, trained on synthetic mixtures, without any modifications, greatly reduces the WER on the recorded dataset LibriCSS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques