Deep Recurrent NMF for Speech Separation by Unfolding Iterative   Thresholding

Scott Wisdom; Thomas Powers; James Pitton; Les Atlas

arXiv:1709.07124·cs.SD·September 22, 2017

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

Scott Wisdom, Thomas Powers, James Pitton, Les Atlas

PDF

1 Repo

TL;DR

This paper introduces a deep recurrent NMF architecture that unfolds iterative soft-thresholding for speech separation, offering interpretability, faster inference, and improved generalization with limited data.

Contribution

The paper presents a novel deep recurrent NMF model that combines interpretability and efficiency, outperforming traditional NMF and LSTM in limited data scenarios.

Findings

01

DR-NMF outperforms sparse NMF and LSTM with limited training data.

02

DR-NMF achieves faster inference than traditional NMF.

03

Competitive separation performance with large training data.

Abstract

In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF network has three distinct advantages. First, DR-NMF provides better interpretability than other deep architectures, since the weights correspond to NMF model parameters, even after training. This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization. Second, like many deep networks, DR-NMF is an order of magnitude faster at test time than NMF, since computation of the network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stwisdom/dr-nmf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability · Sigmoid Activation · Tanh Activation · Long Short-Term Memory