TL;DR
This paper introduces a shallow, non-causal FFTNet architecture for speech enhancement that uses fewer parameters and outperforms or matches existing models in quality, leveraging long-term speech structure in the waveform domain.
Contribution
The paper proposes a novel non-causal, shallow FFTNet architecture with wide dilation for speech enhancement, reducing parameters and improving quality over existing models.
Findings
SE-FFTNet has 32% fewer parameters than WaveNet.
SE-FFTNet outperforms WaveNet in subjective and objective quality metrics.
SE-FFTNet matches SEGAN's performance with significantly fewer parameters.
Abstract
In this paper, we suggest a new parallel, non-causal and shallow waveform domain architecture for speech enhancement based on FFTNet, a neural network for generating high quality audio waveform. In contrast to other waveform based approaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an architecture better represents the long term correlated structure of speech in the time domain, where noise is usually highly non-correlated, and therefore it is suitable for waveform domain based speech enhancement. To further strengthen this feature of FFTNet, we suggest a non-causal FFTNet architecture, where the present sample in each layer is estimated from the past and future samples of the previous layer. By suggesting a shallow network and applying non-causality within certain limits, the suggested FFTNet for speech enhancement (SE-FFTNet) uses much fewer parameters compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture of Logistic Distributions · Dilated Causal Convolution · WaveNet
