TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia and, DeLiang Wang

TL;DR
This paper introduces TPARN, a novel multichannel speech enhancement model that extends dual-path RNNs with a third spatial path, achieving superior performance in time-domain speech enhancement.
Contribution
The paper proposes TPARN, a triple-path attentive recurrent network that incorporates spatial context for multichannel speech enhancement, extending existing dual-path models with an additional spatial dimension.
Findings
TPARN outperforms existing state-of-the-art methods.
TPARN effectively captures spatial and temporal information.
Experimental results validate the superiority of TPARN.
Abstract
In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced along the spatial dimension for spatial context aggregation. TPARN is designed as a multiple-input and multiple-output architecture to enhance all input channels simultaneously. Experimental results demonstrate the superiority of TPARN over existing state-of-the-art approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
