Practical applicability of deep neural networks for overlapping speaker   separation

Pieter Appeltans; Jeroen Zegers; Hugo Van hamme

arXiv:1912.09261·cs.LG·December 20, 2019

Practical applicability of deep neural networks for overlapping speaker separation

Pieter Appeltans, Jeroen Zegers, Hugo Van hamme

PDF

Open Access

TL;DR

This paper evaluates the real-world applicability of deep neural networks like deep clustering and deep attractor networks for separating overlapping speakers across various languages and noisy environments, highlighting their robustness and limitations.

Contribution

It provides a comprehensive analysis of the performance of deep speaker separation methods in multilingual and noisy scenarios, with proposed modifications for improved noise robustness.

Findings

01

Methods work across multiple languages with minimal performance loss.

02

Performance degrades in noisy environments but can be improved with modifications.

03

Deep clustering and deep attractor networks are effective for overlapping speaker separation.

Abstract

This paper examines the applicability in realistic scenarios of two deep learning based solutions to the overlapping speaker separation problem. Firstly, we present experiments that show that these methods are applicable for a broad range of languages. Further experimentation indicates limited performance loss for untrained languages, when these have common features with the trained language(s). Secondly, it investigates how the methods deal with realistic background noise and proposes some modifications to better cope with these disturbances. The deep learning methods that will be examined are deep clustering and deep attractor networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing