Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning
Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

TL;DR
This paper introduces a multi-stage training and curriculum learning approach, combined with realistic room impulse response simulation, to significantly enhance reverberant speech separation performance.
Contribution
It proposes a novel combination of training strategies and a geometric acoustic simulator to improve speech separation in reverberant environments.
Findings
Significant improvement over prior image source method techniques.
Enhanced separation performance with synthetic and real RIR mixing.
Effective in various real room configurations from VOiCES dataset.
Abstract
We present a novel approach that improves the performance of reverberant speech separation. Our approach is based on an accurate geometric acoustic simulator (GAS) which generates realistic room impulse responses (RIRs) by modeling both specular and diffuse reflections. We also propose three training methods - pre-training, multi-stage training and curriculum learning that significantly improve separation quality in the presence of reverberation. We also demonstrate that mixing the synthetic RIRs with a small number of real RIRs during training enhances separation performance. We evaluate our approach on reverberant mixtures generated from real, recorded data (in several different room configurations) from the VOiCES dataset. Our novel approach (curriculum learning+pre-training+multi-stage training) results in a significant relative improvement over prior techniques based on image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing
