TL;DR
This paper introduces HGCN, a novel speech enhancement model that accurately detects and compensates for masked harmonics in noisy speech, outperforming existing methods.
Contribution
The paper proposes a harmonic gated compensation network with a high-resolution harmonic spectrum and gating mechanism for improved speech enhancement.
Findings
HGCN outperforms existing approaches in speech enhancement tasks.
The high-resolution harmonic spectrum improves harmonic location prediction.
The gating mechanism effectively refines enhancement results.
Abstract
Mask processing in the time-frequency (T-F) domain through the neural network has been one of the mainstreams for single-channel speech enhancement. However, it is hard for most models to handle the situation when harmonics are partially masked by noise. To tackle this challenge, we propose a harmonic gated compensation network (HGCN). We design a high-resolution harmonic integral spectrum to improve the accuracy of harmonic locations prediction. Then we add voice activity detection (VAD) and voiced region detection (VRD) to the convolutional recurrent network (CRN) to filter harmonic locations. Finally, the harmonic gating mechanism is used to guide the compensation model to adjust the coarse results from CRN to obtain the refinedly enhanced results. Our experiments show HGCN achieves substantial gain over a number of advanced approaches in the community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConditional Relation Network
