Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR
Fu-An Chao, Jeih-weih Hung, Berlin Chen

TL;DR
This paper introduces a novel cross-domain speech enhancement model with a bi-projection fusion mechanism that leverages phase information in both time and frequency domains, significantly improving noise robustness in ASR.
Contribution
It proposes a new cross-domain SE model with a bi-projection fusion module, advancing noise-robust speech enhancement and ASR performance over existing methods.
Findings
Outperforms current top SE methods in enhancement quality.
Improves ASR accuracy on noisy speech datasets.
Effective on both seen and unseen noise conditions.
Abstract
In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
