Deep Interaction between Masking and Mapping Targets for Single-Channel Speech Enhancement
Lu Zhang, Mingjiang Wang, Zehua Zhang, Xuyi Zhuang

TL;DR
This paper introduces a multi-branch dilated convolutional network that enhances both magnitude and phase of noisy speech simultaneously, improving speech quality and intelligibility with efficient computation.
Contribution
It proposes a novel multi-objective learning framework with IRM-based feature attention and multi-scale dilated convolutions for improved speech enhancement.
Findings
Outperforms state-of-the-art models in speech quality and intelligibility
Achieves better performance with less computation
Effectively models temporal information for speech enhancement
Abstract
The most recent deep neural network (DNN) models exhibit impressive denoising performance in the time-frequency (T-F) magnitude domain. However, the phase is also a critical component of the speech signal that is easily overlooked. In this paper, we propose a multi-branch dilated convolutional network (DCN) to simultaneously enhance the magnitude and phase of noisy speech. A causal and robust monaural speech enhancement system is achieved based on the multi-objective learning framework of the complex spectrum and the ideal ratio mask (IRM) targets. In the process of joint learning, the intermediate estimation of IRM targets is used as a way of generating feature attention factors to realize the information interaction between the two targets. Moreover, the proposed multi-scale dilated convolution enables the DCN model to have a more efficient temporal modeling capability. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsConvolution · Dilated Convolution
