Speech Enhancement with Intelligent Neural Homomorphic Synthesis
Shulin He, Wei Rao, Jinjiang Liu, Jun Chen, Yukai Ju, Xueliang Zhang,, Yannan Wang, Shidong Shang

TL;DR
This paper introduces a neural source filter network for speech enhancement that leverages homomorphic signal processing and cepstral analysis to better separate speech components, resulting in improved signal quality.
Contribution
It presents a novel neural source filter approach combining traditional signal processing with neural networks, outperforming existing methods in speech enhancement.
Findings
SI-SNR improved by 1.363dB over FullSubNet
Uses attentive recurrent networks for excitation and vocal tract prediction
Achieves better speech quality through neural homomorphic synthesis
Abstract
Most neural network speech enhancement models ignore speech production mathematical models by directly mapping Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we use homomorphic signal processing and cepstral analysis to obtain noisy speech's excitation and vocal tract. Unlike traditional signal processing, we use an attentive recurrent network (ARN) model predicted ratio mask to replace the liftering separation function. Then two convolutional attentive recurrent network (CARN) networks are used to predict the excitation and vocal tract of clean speech, respectively. The system's output is synthesized from the estimated excitation and vocal. Experiments prove that our proposed method performs better, with SI-SNR improving by 1.363dB compared to FullSubNet.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
