Demystifying TasNet: A Dissecting Approach

Jens Heitkaemper; Darius Jakobeit; Christoph Boeddeker; Lukas; Drude; Reinhold Haeb-Umbach

arXiv:1911.08895·cs.SD·February 6, 2020

Demystifying TasNet: A Dissecting Approach

Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas, Drude, Reinhold Haeb-Umbach

PDF

TL;DR

This paper analyzes the components of TasNet by gradually replacing parts of a frequency domain separation system, revealing insights into its performance, advantages, and limitations in different acoustic environments.

Contribution

It introduces a dissecting approach to understand TasNet's gains, linking its loss function to MSE, and evaluates its generalization to reverberant conditions.

Findings

01

Intermediate variants achieve comparable SDR gains to TasNet.

02

The si-SDR loss function relates to a logarithmic MSE, explaining performance.

03

Reverberant conditions reduce the effectiveness of TasNet's gains.

Abstract

In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments. In this paper we dissect the gains of the time-domain audio separation network (TasNet) approach by gradually replacing components of an utterance-level permutation invariant training (u-PIT) based separation system in the frequency domain until the TasNet system is reached, thus blending components of frequency domain approaches with those of time domain approaches. Some of the intermediate variants achieve comparable signal-to-distortion ratio (SDR) gains to TasNet, but retain the advantage of frequency domain processing: compatibility with classic signal processing tools such as frequency-domain beamforming and the human interpretability of the masks. Furthermore, we show that the scale invariant signal-to-distortion ratio (si-SDR)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability