A Dual-Staged Context Aggregation Method Towards Efficient End-To-End   Speech Enhancement

Kai Zhen; Mi Suk Lee; Minje Kim

arXiv:1908.06468·cs.SD·February 10, 2020·1 cites

A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement

Kai Zhen, Mi Suk Lee, Minje Kim

PDF

Open Access

TL;DR

This paper introduces DCCRN, a hybrid neural network architecture that efficiently aggregates temporal context for end-to-end speech enhancement, achieving superior performance with low complexity.

Contribution

The paper proposes a novel densely connected hybrid network architecture for dual-staged context aggregation in end-to-end speech enhancement.

Findings

01

DCCRN outperforms baseline models in STOI and PESQ scores.

02

The model is computationally efficient with only 1.38 million parameters.

03

It maintains decent generalizability to unseen noise types.

Abstract

In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Advanced Adaptive Filtering Techniques