Improving Reverberant Speech Separation with Multi-stage Training and   Curriculum Learning

Rohith Aralikatti; Anton Ratnarajah; Zhenyu Tang; Dinesh Manocha

arXiv:2107.09177·eess.AS·July 21, 2021·1 cites

Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning

Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

PDF

Open Access

TL;DR

This paper introduces a multi-stage training and curriculum learning approach, combined with realistic room impulse response simulation, to significantly enhance reverberant speech separation performance.

Contribution

It proposes a novel combination of training strategies and a geometric acoustic simulator to improve speech separation in reverberant environments.

Findings

01

Significant improvement over prior image source method techniques.

02

Enhanced separation performance with synthetic and real RIR mixing.

03

Effective in various real room configurations from VOiCES dataset.

Abstract

We present a novel approach that improves the performance of reverberant speech separation. Our approach is based on an accurate geometric acoustic simulator (GAS) which generates realistic room impulse responses (RIRs) by modeling both specular and diffuse reflections. We also propose three training methods - pre-training, multi-stage training and curriculum learning that significantly improve separation quality in the presence of reverberation. We also demonstrate that mixing the synthetic RIRs with a small number of real RIRs during training enhances separation performance. We evaluate our approach on reverberant mixtures generated from real, recorded data (in several different room configurations) from the VOiCES dataset. Our novel approach (curriculum learning+pre-training+multi-stage training) results in a significant relative improvement over prior techniques based on image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing