Self-attending RNN for Speech Enhancement to Improve Cross-corpus   Generalization

Ashutosh Pandey; DeLiang Wang

arXiv:2105.12831·cs.SD·April 14, 2022

Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization

Ashutosh Pandey, DeLiang Wang

PDF

TL;DR

This paper introduces a self-attending RNN architecture for speech enhancement that significantly improves cross-corpus generalization, especially in challenging low SNR conditions, outperforming existing methods.

Contribution

The study proposes a novel self-attending recurrent neural network (ARN) that enhances cross-corpus speech enhancement performance and compares two major approaches, revealing their similar effectiveness.

Findings

01

ARN outperforms RNNs and dual-path ARNs in low SNR conditions

02

Complex spectral mapping and time-domain enhancement yield similar results with ARN

03

A challenging test subset is provided for future benchmarking

Abstract

Deep neural networks (DNNs) represent the mainstream methodology for supervised speech enhancement, primarily due to their capability to model complex functions using hierarchical representations. However, a recent study revealed that DNNs trained on a single corpus fail to generalize to untrained corpora, especially in low signal-to-noise ratio (SNR) conditions. Developing a noise, speaker, and corpus independent speech enhancement algorithm is essential for real-world applications. In this study, we propose a self-attending recurrent neural network, or attentive recurrent network (ARN), for time-domain speech enhancement to improve cross-corpus generalization. ARN comprises of recurrent neural networks (RNNs) augmented with self-attention blocks and feedforward blocks. We evaluate ARN on different corpora with nonstationary noises in low SNR conditions. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.