RHR-Net: A Residual Hourglass Recurrent Neural Network for Speech   Enhancement

Jalal Abdulbaqi; Yue Gu; and Ivan Marsic

arXiv:1904.07294·eess.AS·April 17, 2019·5 cites

RHR-Net: A Residual Hourglass Recurrent Neural Network for Speech Enhancement

Jalal Abdulbaqi, Yue Gu, and Ivan Marsic

PDF

Open Access 2 Repos

TL;DR

This paper introduces RHR-Net, a fully-recurrent hourglass neural network for speech enhancement that processes waveforms directly, capturing long-range dependencies efficiently and outperforming existing methods.

Contribution

The paper presents a novel end-to-end recurrent hourglass architecture with residual connections for waveform-based speech enhancement, addressing limitations of previous models.

Findings

01

Outperforms state-of-the-art in six evaluation metrics

02

Efficiently captures long-range temporal dependencies

03

Reduces features resolution without information loss

Abstract

Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss. Previous work has overcome these issues by using convolutional networks to learn long-range temporal correlations across high-resolution waveforms. These models, however, are limited by memory-intensive dilated convolution and aliasing artifacts from upsampling. We introduce an end-to-end fully-recurrent hourglass-shaped neural network architecture with residual connections for waveform-based single-channel speech enhancement. Our model can efficiently capture long-range temporal dependencies by reducing the features resolution without information loss. Experimental results show that our model outperforms state-of-the-art approaches in six evaluation metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Speech Recognition and Synthesis

MethodsDilated Convolution · Convolution