Real-time speech enhancement using equilibriated RNN
Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru, Harada

TL;DR
This paper introduces a causal equilibriated RNN for real-time speech enhancement, achieving comparable performance to LSTM-based models but with fewer parameters and computational resources.
Contribution
The paper presents an equilibriated RNN structure that avoids vanishing/exploding gradients without increasing parameters, suitable for real-time speech enhancement.
Findings
Achieved similar speech enhancement performance to LSTM with fewer parameters.
Demonstrated effectiveness of ERNN in real-time, causal speech processing.
Reduced computational complexity compared to traditional LSTM models.
Abstract
We propose a speech enhancement method using a causal deep neural network~(DNN) for real-time applications. DNN has been widely used for estimating a time-frequency~(T-F) mask which enhances a speech signal. One popular DNN structure for that is a recurrent neural network~(RNN) owing to its capability of effectively modelling time-sequential data like speech. In particular, the long short-term memory (LSTM) is often used to alleviate the vanishing/exploding gradient problem which makes the training of an RNN difficult. However, the number of parameters of LSTM is increased as the price of mitigating the difficulty of training, which requires more computational resources. For real-time speech enhancement, it is preferable to use a smaller network without losing the performance. In this paper, we propose to use the equilibriated recurrent neural network~(ERNN) for avoiding the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
