Developing an Effective Training Dataset to Enhance the Performance of   AI-based Speaker Separation Systems

Rawad Melhem; Assef Jafar; Oumayma Al Dakkak

arXiv:2411.08375·cs.SD·November 14, 2024

Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems

Rawad Melhem, Assef Jafar, Oumayma Al Dakkak

PDF

Open Access

TL;DR

This paper introduces a new method for creating realistic training datasets for AI-based speaker separation, significantly improving model performance in real-world noisy and echoic conditions.

Contribution

The paper presents a novel approach to constructing realistic training datasets that better represent real-world audio complexities for speaker separation models.

Findings

01

1.65 dB improvement in SI-SDR with the new dataset

02

Enhanced speaker separation accuracy in real-world conditions

03

Demonstrated the effectiveness of realistic datasets over synthetic ones

Abstract

This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however, often degrade in real recording conditions due to the presence of noise, echo, and other interferences. This is because neural models are typically trained on synthetic datasets consisting of mixed audio signals and their corresponding ground truths, which are generated using computer software and do not fully represent the complexities of real-world recording scenarios. The lack of realistic training sets for speaker separation remains a major hurdle, as obtaining individual sounds from mixed audio signals is a nontrivial task. To address this issue, we propose a novel method for constructing a realistic training set that includes mixture signals and corresponding ground truths for each speaker. We evaluate this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSparse Evolutionary Training