Coarse-to-fine Optimization for Speech Enhancement
Jian Yao, Ahmad Al-Dahle

TL;DR
This paper introduces a coarse-to-fine optimization strategy for speech enhancement that improves speech quality by progressively refining the similarity constraints, achieving state-of-the-art results with both discriminative and generative models.
Contribution
It proposes a novel coarse-to-fine optimization approach for cosine similarity loss and applies it to GANs with a dynamic perceptual loss for improved speech enhancement.
Findings
Enhanced speech quality with coarse-to-fine optimization.
Improved results using dynamic perceptual loss in GANs.
Achieved state-of-the-art performance in speech enhancement.
Abstract
In this paper, we propose the coarse-to-fine optimization for the task of speech enhancement. Cosine similarity loss [1] has proven to be an effective metric to measure similarity of speech signals. However, due to the large variance of the enhanced speech with even the same cosine similarity loss in high dimensional space, a deep neural network learnt with this loss might not be able to predict enhanced speech with good quality. Our coarse-to-fine strategy optimizes the cosine similarity loss for different granularities so that more constraints are added to the prediction from high dimension to relatively low dimension. In this way, the enhanced speech will better resemble the clean speech. Experimental results show the effectiveness of our proposed coarse-to-fine optimization in both discriminative models and generative models. Moreover, we apply the coarse-to-fine strategy to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Digital Media Forensic Detection
