Real-time speech enhancement using equilibriated RNN

Daiki Takeuchi; Kohei Yatabe; Yuma Koizumi; Yasuhiro Oikawa; Noboru; Harada

arXiv:2002.05843·eess.AS·February 17, 2020·5 cites

Real-time speech enhancement using equilibriated RNN

Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru, Harada

PDF

Open Access 1 Repo

TL;DR

This paper introduces a causal equilibriated RNN for real-time speech enhancement, achieving comparable performance to LSTM-based models but with fewer parameters and computational resources.

Contribution

The paper presents an equilibriated RNN structure that avoids vanishing/exploding gradients without increasing parameters, suitable for real-time speech enhancement.

Findings

01

Achieved similar speech enhancement performance to LSTM with fewer parameters.

02

Demonstrated effectiveness of ERNN in real-time, causal speech processing.

03

Reduced computational complexity compared to traditional LSTM models.

Abstract

We propose a speech enhancement method using a causal deep neural network~(DNN) for real-time applications. DNN has been widely used for estimating a time-frequency~(T-F) mask which enhances a speech signal. One popular DNN structure for that is a recurrent neural network~(RNN) owing to its capability of effectively modelling time-sequential data like speech. In particular, the long short-term memory (LSTM) is often used to alleviate the vanishing/exploding gradient problem which makes the training of an RNN difficult. However, the number of parameters of LSTM is increased as the price of mitigating the difficulty of training, which requires more computational resources. For real-time speech enhancement, it is preferable to use a smaller network without losing the performance. In this paper, we propose to use the equilibriated recurrent neural network~(ERNN) for avoiding the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dtake1336/ERNN-for-speech-enhancement
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing