Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
Rawad Melhem, Assef Jafar, Oumayma Al Dakkak

TL;DR
This paper introduces a novel method for creating a realistic speech separation dataset with ground truths by recording two speakers simultaneously, enabling better training and benchmarking of deep learning models in real-world scenarios.
Contribution
The paper presents a new approach to generate realistic speech separation datasets with ground truths, addressing the challenge of obtaining accurate source signals in natural environments.
Findings
Improved SI-SDR by 1.65 dB using the new dataset
Enhanced PESQ score by approximately 0.5
Method increased model stability at various microphone-speaker distances
Abstract
Speech separation is very important in real-world applications such as human-machine interaction, hearing aids devices, and automatic meeting transcription. In recent years, a significant improvement occurred towards the solution based on deep learning. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of real-world mixtures. The difficulty in building a realistic dataset led researchers to use unsupervised learning methods, because of their ability to handle realistic mixtures directly. The results of unsupervised learning methods are still unconvincing. In this paper, a method is introduced to create a realistic dataset with ground truth sources for speech separation. The main challenge in designing a realistic dataset is the unavailability of ground truths for speakers signals. To address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
