Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

Ashutosh Pandey; Buye Xu; Anurag Kumar; Jacob Donley; Paul Calamia,; DeLiang Wang

arXiv:2110.11844·cs.SD·July 6, 2022

Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia,, DeLiang Wang

PDF

Open Access

TL;DR

This paper introduces a novel time-domain triple-path neural network that enhances speech in ad-hoc microphone arrays by using self-attention for spatial processing and a dual-path network for temporal processing, handling unknown microphone configurations.

Contribution

The paper presents a new triple-path network architecture that effectively processes ad-hoc arrays with unknown microphone order and placement, improving multichannel speech enhancement.

Findings

01

The proposed network outperforms existing methods in speech enhancement tasks.

02

It effectively utilizes multichannel information from distant microphones.

03

The approach demonstrates robustness to unknown microphone configurations.

Abstract

Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel triple-path network for ad-hoc array processing in the time domain. The key idea in the network design is to divide the overall processing into spatial processing and temporal processing and use self-attention for spatial processing. Using self-attention for spatial processing makes the network invariant to the order and the number of microphones. The temporal processing is done independently for all channels using a recently proposed dual-path attentive recurrent network. The proposed network is a multiple-input multiple-output architecture that can simultaneously enhance signals at all microphones. Experimental results demonstrate the excellent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis