TasNet: time-domain audio separation network for real-time,   single-channel speech separation

Yi Luo; Nima Mesgarani

arXiv:1711.00541·cs.SD·April 19, 2018·37 cites

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Yi Luo, Nima Mesgarani

PDF

Open Access 3 Repos

TL;DR

TasNet introduces a time-domain neural network for real-time, single-channel speech separation that outperforms frequency-domain methods, reducing latency and computational cost, suitable for low-power applications.

Contribution

The paper presents TasNet, a novel time-domain approach that directly models signals, eliminating the need for frequency decomposition and improving real-time speech separation performance.

Findings

01

Outperforms state-of-the-art speech separation algorithms

02

Reduces computational cost significantly

03

Minimizes latency for real-time applications

Abstract

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing