Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems
Rawad Melhem, Assef Jafar, Oumayma Al Dakkak

TL;DR
This paper introduces a new method for creating realistic training datasets for AI-based speaker separation, significantly improving model performance in real-world noisy and echoic conditions.
Contribution
The paper presents a novel approach to constructing realistic training datasets that better represent real-world audio complexities for speaker separation models.
Findings
1.65 dB improvement in SI-SDR with the new dataset
Enhanced speaker separation accuracy in real-world conditions
Demonstrated the effectiveness of realistic datasets over synthetic ones
Abstract
This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however, often degrade in real recording conditions due to the presence of noise, echo, and other interferences. This is because neural models are typically trained on synthetic datasets consisting of mixed audio signals and their corresponding ground truths, which are generated using computer software and do not fully represent the complexities of real-world recording scenarios. The lack of realistic training sets for speaker separation remains a major hurdle, as obtaining individual sounds from mixed audio signals is a nontrivial task. To address this issue, we propose a novel method for constructing a realistic training set that includes mixture signals and corresponding ground truths for each speaker. We evaluate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSparse Evolutionary Training
