Single-Microphone Speech Enhancement and Separation Using Deep Learning

Morten Kolb{\ae}k

arXiv:1808.10620·cs.SD·December 5, 2018

Single-Microphone Speech Enhancement and Separation Using Deep Learning

Morten Kolb{\ae}k

PDF

Open Access

TL;DR

This paper explores deep learning techniques for single-microphone speech enhancement and separation, demonstrating improved generalizability, state-of-the-art separation results, and effective joint enhancement without prior noise or speaker information.

Contribution

It introduces uPIT, a novel deep learning algorithm for speech separation, and provides insights into training data design for better generalizability of enhancement algorithms.

Findings

01

uPIT achieves state-of-the-art multi-talker separation results.

02

Deep learning enhancement algorithms can be optimized for speech intelligibility.

03

Carefully designed training data improves generalizability of speech enhancement models.

Abstract

The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest. A signal processing algorithm that can effectively increase the speech intelligibility and quality of speech signals in such complicated acoustic situations is highly desirable. Especially for applications involving mobile communication devices and hearing assistive devices. Due to the re-emergence of machine learning techniques, today, known as deep learning, the challenges involved with such algorithms might be overcome. In this PhD thesis, we study and develop deep learning-based techniques for two sub-disciplines of the cocktail party problem: single-microphone speech enhancement and single-microphone multi-talker speech separation. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies