TPARN: Triple-path Attentive Recurrent Network for Time-domain   Multichannel Speech Enhancement

Ashutosh Pandey; Buye Xu; Anurag Kumar; Jacob Donley; Paul Calamia and; DeLiang Wang

arXiv:2110.10757·cs.SD·April 7, 2022

TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia and, DeLiang Wang

PDF

Open Access

TL;DR

This paper introduces TPARN, a novel multichannel speech enhancement model that extends dual-path RNNs with a third spatial path, achieving superior performance in time-domain speech enhancement.

Contribution

The paper proposes TPARN, a triple-path attentive recurrent network that incorporates spatial context for multichannel speech enhancement, extending existing dual-path models with an additional spatial dimension.

Findings

01

TPARN outperforms existing state-of-the-art methods.

02

TPARN effectively captures spatial and temporal information.

03

Experimental results validate the superiority of TPARN.

Abstract

In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced along the spatial dimension for spatial context aggregation. TPARN is designed as a multiple-input and multiple-output architecture to enhance all input channels simultaneously. Experimental results demonstrate the superiority of TPARN over existing state-of-the-art approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques