GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
Teysir Baoueb, Xiaoyu Bie, Mathieu Fontaine, Ga\"el Richard

TL;DR
This paper introduces GLA-Grad++, an enhanced diffusion-based speech synthesis model that improves audio quality and stability, especially for out-of-domain inputs, by optimizing the application of the Griffin-Lim algorithm during vocoding.
Contribution
GLA-Grad++ innovatively applies the Griffin-Lim correction only once per process, significantly accelerating speech synthesis while maintaining high quality and robustness.
Findings
Outperforms baseline models in quality and stability
Shows improved performance in out-of-domain scenarios
Accelerates generation process without quality loss
Abstract
Recent advances in diffusion models have positioned them as powerful generative frameworks for speech synthesis, demonstrating substantial improvements in audio quality and stability. Nevertheless, their effectiveness in vocoders conditioned on mel spectrograms remains constrained, particularly when the conditioning diverges from the training distribution. The recently proposed GLA-Grad model introduced a phase-aware extension to the WaveGrad vocoder that integrated the Griffin-Lim algorithm (GLA) into the reverse process to reduce inconsistencies between generated signals and conditioning mel spectrogram. In this paper, we further improve GLA-Grad through an innovative choice in how to apply the correction. Particularly, we compute the correction term only once, with a single application of GLA, to accelerate the generation process. Experimental results demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music Technology and Sound Studies
