Speech Enhancement with Intelligent Neural Homomorphic Synthesis

Shulin He; Wei Rao; Jinjiang Liu; Jun Chen; Yukai Ju; Xueliang Zhang,; Yannan Wang; Shidong Shang

arXiv:2210.15853·cs.SD·October 31, 2022

Speech Enhancement with Intelligent Neural Homomorphic Synthesis

Shulin He, Wei Rao, Jinjiang Liu, Jun Chen, Yukai Ju, Xueliang Zhang,, Yannan Wang, Shidong Shang

PDF

Open Access

TL;DR

This paper introduces a neural source filter network for speech enhancement that leverages homomorphic signal processing and cepstral analysis to better separate speech components, resulting in improved signal quality.

Contribution

It presents a novel neural source filter approach combining traditional signal processing with neural networks, outperforming existing methods in speech enhancement.

Findings

01

SI-SNR improved by 1.363dB over FullSubNet

02

Uses attentive recurrent networks for excitation and vocal tract prediction

03

Achieves better speech quality through neural homomorphic synthesis

Abstract

Most neural network speech enhancement models ignore speech production mathematical models by directly mapping Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we use homomorphic signal processing and cepstral analysis to obtain noisy speech's excitation and vocal tract. Unlike traditional signal processing, we use an attentive recurrent network (ARN) model predicted ratio mask to replace the liftering separation function. Then two convolutional attentive recurrent network (CARN) networks are used to predict the excitation and vocal tract of clean speech, respectively. The system's output is synthesized from the estimated excitation and vocal. Experiments prove that our proposed method performs better, with SI-SNR improving by 1.363dB compared to FullSubNet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development