On the Use of Deep Mask Estimation Module for Neural Source Separation Systems
Kai Li, Xiaolin Hu, Yi Luo

TL;DR
This paper investigates the impact of deep mask estimation modules in neural source separation systems, demonstrating that they effectively approximate the overseparation-grouping paradigm and improve performance over shallow modules.
Contribution
The paper provides an analysis connecting deep mask estimation modules to the overseparation-grouping paradigm, highlighting their efficiency in neural source separation.
Findings
Deep mask estimation modules outperform shallow ones.
Deep modules approximate overseparation-grouping paradigm.
Empirical validation shows improved source separation performance.
Abstract
Most of the recent neural source separation systems rely on a masking-based pipeline where a set of multiplicative masks are estimated from and applied to a signal representation of the input mixture. The estimation of such masks, in almost all network architectures, is done by a single layer followed by an optional nonlinear activation function. However, recent literatures have investigated the use of a deep mask estimation module and observed performance improvement compared to a shallow mask estimation module. In this paper, we analyze the role of such deeper mask estimation module by connecting it to a recently proposed unsupervised source separation method, and empirically show that the deep mask estimation module is an efficient approximation of the so-called overseparation-grouping paradigm with the conventional shallow mask estimation layers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Underwater Acoustics Research
