DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech   Enhancement

Yanxin Hu; Yun Liu; Shubo Lv; Mengtao Xing; Shimin Zhang; Yihui Fu,; Jian Wu; Bihong Zhang; Lei Xie

arXiv:2008.00264·eess.AS·September 24, 2020·61 cites

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu,, Jian Wu, Bihong Zhang, Lei Xie

PDF

Open Access 5 Repos

TL;DR

This paper introduces DCCRN, a novel deep neural network that effectively models complex-valued spectrograms for phase-aware speech enhancement, achieving top performance in real-time noise suppression tasks.

Contribution

The paper proposes a new complex-valued convolution-recurrent network that simulates complex operations, improving speech enhancement performance over previous real-valued models.

Findings

01

DCCRN outperforms previous models on objective and subjective metrics.

02

DCCRN ranked first in real-time and second in non-real-time tracks at Interspeech 2020 DNS challenge.

03

The model achieves high performance with only 3.7 million parameters.

Abstract

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolution recurrent network (CRN) integrates a convolutional encoder-decoder (CED) structure and long short-term memory (LSTM), which has been proven to be helpful for complex targets. In order to train the complex target more effectively, in this paper, we design a new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis