TL;DR
This paper introduces a deep recurrent NMF architecture that unfolds iterative soft-thresholding for speech separation, offering interpretability, faster inference, and improved generalization with limited data.
Contribution
The paper presents a novel deep recurrent NMF model that combines interpretability and efficiency, outperforming traditional NMF and LSTM in limited data scenarios.
Findings
DR-NMF outperforms sparse NMF and LSTM with limited training data.
DR-NMF achieves faster inference than traditional NMF.
Competitive separation performance with large training data.
Abstract
In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF network has three distinct advantages. First, DR-NMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DR-NMF is an order of magnitude faster at test time than NMF, since computation of the network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
