Supervised Speech Separation Based on Deep Learning: An Overview
DeLiang Wang, Jitong Chen

TL;DR
This paper reviews recent advances in deep learning-based supervised speech separation, highlighting methods, challenges, and progress in separating speech from background noise using neural networks.
Contribution
It provides a comprehensive overview of supervised speech separation techniques, focusing on deep learning methods, training targets, features, and generalization issues.
Findings
Deep learning has significantly improved speech separation performance.
Various monaural and multi-microphone algorithms have been developed.
Generalization remains a key challenge in supervised speech separation.
Abstract
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then we discuss three main components of supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
