Phase-aware Speech Enhancement with Deep Complex U-Net
Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha,, and Kyogu Lee

TL;DR
This paper introduces a phase-aware speech enhancement model using a deep complex U-Net with a novel masking method and loss function, achieving state-of-the-art results in speech quality improvement.
Contribution
The paper presents a novel deep complex U-Net architecture, a polar coordinate-wise masking method, and a weighted SDR loss for improved phase-aware speech enhancement.
Findings
Achieves state-of-the-art speech enhancement performance.
Outperforms previous methods significantly.
All proposed components are empirically validated.
Abstract
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis
MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net
