Memory Time Span in LSTMs for Multi-Speaker Source Separation

Jeroen Zegers; Hugo Van hamme

arXiv:1808.08097·cs.LG·August 27, 2018

Memory Time Span in LSTMs for Multi-Speaker Source Separation

Jeroen Zegers, Hugo Van hamme

PDF

1 Repo

TL;DR

This paper investigates the memory time span of LSTM networks in multi-speaker speech separation by leaking state variables and evaluating performance, revealing both long-term and short-term effects.

Contribution

It introduces a method to measure the relevant memory time span in LSTMs, specifically applied to multi-speaker source separation, highlighting different temporal effects.

Findings

01

LSTM memory span includes long-term speaker characteristics.

02

Short-term effects relate to formant tracking.

03

Method can be applied to other tasks to estimate LSTM memory use.

Abstract

With deep learning approaches becoming state-of-the-art in many speech (as well as non-speech) related machine learning tasks, efforts are being taken to delve into the neural networks which are often considered as a black box. In this paper it is analyzed how recurrent neural network (RNNs) cope with temporal dependencies by determining the relevant memory time span in a long short-term memory (LSTM) cell. This is done by leaking the state variable with a controlled lifetime and evaluating the task performance. This technique can be used for any task to estimate the time span the LSTM exploits in that specific scenario. The focus in this paper is on the task of separating speakers from overlapping speech. We discern two effects: A long term effect, probably due to speaker characterization and a short term effect, probably exploiting phone-size formant tracks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JeroenZegers/Nabu-MSSS
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory