ARiSE: Auto-Regressive Multi-Channel Speech Enhancement
Pengjie Shen, Xueliang Zhang, Zhong-Qiu Wang

TL;DR
ARiSE introduces an auto-regressive multi-channel speech enhancement method that leverages previous estimates to improve current speech estimation, with a novel parallel training mechanism for efficiency, showing promising results in noisy-reverberant environments.
Contribution
The paper presents a novel auto-regressive approach for multi-channel speech enhancement that incorporates previous frame estimates and beamforming, along with a parallel training method to accelerate learning.
Findings
Effective in noisy-reverberant conditions
Improves speech enhancement performance
Parallel training speeds up model development
Abstract
We propose ARiSE, an auto-regressive algorithm for multi-channel speech enhancement. ARiSE improves existing deep neural network (DNN) based frame-online multi-channel speech enhancement models by introducing auto-regressive connections, where the estimated target speech at previous frames is leveraged as extra input features to help the DNN estimate the target speech at the current frame. The extra input features can be derived from (a) the estimated target speech in previous frames; and (b) a beamformed mixture with the beamformer computed based on the previous estimated target speech. On the other hand, naively training the DNN in an auto-regressive manner is very slow. To deal with this, we propose a parallel training mechanism to speed up the training. Evaluation results in noisy-reverberant conditions show the effectiveness and potential of the proposed algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
