Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for   Speech Separation

Yi Luo; Nima Mesgarani

arXiv:1809.07454·cs.SD·May 16, 2019

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, Nima Mesgarani

PDF

5 Repos

TL;DR

Conv-TasNet is an end-to-end time-domain speech separation network that outperforms traditional time-frequency masking methods and ideal masks, with lower latency and smaller model size, suitable for real-time applications.

Contribution

The paper introduces Conv-TasNet, a novel fully-convolutional time-domain network that surpasses previous methods and ideal masks in speech separation accuracy and efficiency.

Findings

01

Outperforms previous time-frequency masking methods in separation quality.

02

Surpasses ideal time-frequency magnitude masks in objective and subjective evaluations.

03

Has smaller model size and shorter latency, enabling real-time processing.

Abstract

Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolutional time-domain audio separation network