Raw Waveform-based Speech Enhancement by Fully Convolutional Networks
Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai

TL;DR
This paper introduces a fully convolutional network (FCN) for end-to-end raw waveform speech enhancement, outperforming spectrum-based methods and significantly reducing model size.
Contribution
The study presents a novel FCN architecture that effectively restores high-frequency speech components directly from raw waveforms, with fewer parameters than traditional DNN and CNN models.
Findings
FCN outperforms DNN and CNN in speech intelligibility and quality metrics.
FCN model has only 0.2% of the parameters of DNN and CNN models.
Effective preservation of local temporal structures in speech signals.
Abstract
This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most existing denoising methods that process the magnitude spectrum (e.g., log power spectrum (LPS)) only. Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform. More specifically, FCN consists of only convolutional layers and thus the local temporal structures of speech signals can be efficiently and effectively preserved with relatively few weights. Experimental results show that DNN- and CNN-based models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
MethodsMax Pooling · Convolution · Fully Convolutional Network
