Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
Hangting Chen, Yang Yi, Dang Feng, Pengyuan Zhang

TL;DR
This paper introduces Beam-Guided TasNet, an iterative multi-channel speech separation framework that enhances separation performance by integrating neural network-based separation with beamforming in a cyclic, mutually reinforcing manner.
Contribution
It proposes a novel cyclic framework allowing multi-channel input and output, enabling iterative refinement and improved performance over traditional Beam-TasNet and approaching oracle MVDR results.
Findings
Achieved SDR of 21.5 dB on spatialized WSJ0-2MIX
Exceeded baseline Beam-TasNet by 4.1 dB SDR
Narrowed performance gap with oracle MVDR to 2 dB
Abstract
Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Music and Audio Processing
