Convolutional-Recurrent Neural Networks for Speech Enhancement

Han Zhao; Shuayb Zarar; Ivan Tashev; Chin-Hui Lee

arXiv:1805.00579·cs.SD·May 3, 2018

Convolutional-Recurrent Neural Networks for Speech Enhancement

Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee

PDF

TL;DR

This paper introduces a convolutional-recurrent neural network model for speech enhancement that is data-driven, exploits local structures, and outperforms existing methods in improving speech quality on synthetic data.

Contribution

The paper presents a novel end-to-end convolutional-recurrent neural network architecture for speech enhancement that leverages local structures and prior knowledge, improving data efficiency and generalization.

Findings

01

Outperforms existing methods in PESQ scores

02

Improves speech quality on both seen and unseen noise

03

Demonstrates effectiveness on synthetic data

Abstract

We propose an end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven and does not make any assumptions about the type or the stationarity of the noise. In contrast to existing methods that use multilayer perceptrons (MLPs), we employ both convolutional and recurrent neural network architectures. Thus, our approach allows us to exploit local structures in both the frequency and temporal domains. By incorporating prior knowledge of speech signals into the design of model structures, we build a model that is more data-efficient and achieves better generalization on both seen and unseen noise. Based on experiments with synthetic data, we demonstrate that our model outperforms existing methods, improving PESQ by up to 0.6 on seen noise and 0.64 on unseen noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.