Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM
Raktim Gautam Goswami, Sivaganesh Andhavarapu, K Sri Rama Murty

TL;DR
This paper introduces a complex-valued LSTM network for speech enhancement that effectively models phase information, leading to improved speech quality over traditional magnitude-based methods.
Contribution
It proposes a novel RCLSTM architecture for complex ratio mask estimation, capturing sequential dependencies and phase information in speech enhancement.
Findings
RCLSTM outperforms real-valued masking methods in objective measures.
The method improves PESQ scores by over 4.3%.
Effective preservation of phase information enhances speech quality.
Abstract
Most of the deep learning based speech enhancement (SE) methods rely on estimating the magnitude spectrum of the clean speech signal from the observed noisy speech signal, either by magnitude spectral masking or regression. These methods reuse the noisy phase while synthesizing the time-domain waveform from the estimated magnitude spectrum. However, there have been recent works highlighting the importance of phase in SE. There was an attempt to estimate the complex ratio mask taking phase into account using complex-valued feed-forward neural network (FFNN). But FFNNs cannot capture the sequential information essential for phase estimation. In this work, we propose a realisation of complex-valued long short-term memory (RCLSTM) network to estimate the complex ratio mask (CRM) using sequential information along time. The proposed RCLSTM is designed to process the complex-valued sequences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
