DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu,, Jian Wu, Bihong Zhang, Lei Xie

TL;DR
This paper introduces DCCRN, a novel deep neural network that effectively models complex-valued spectrograms for phase-aware speech enhancement, achieving top performance in real-time noise suppression tasks.
Contribution
The paper proposes a new complex-valued convolution-recurrent network that simulates complex operations, improving speech enhancement performance over previous real-valued models.
Findings
DCCRN outperforms previous models on objective and subjective metrics.
DCCRN ranked first in real-time and second in non-real-time tracks at Interspeech 2020 DNS challenge.
The model achieves high performance with only 3.7 million parameters.
Abstract
Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolution recurrent network (CRN) integrates a convolutional encoder-decoder (CED) structure and long short-term memory (LSTM), which has been proven to be helpful for complex targets. In order to train the complex target more effectively, in this paper, we design a new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis
