SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with   Adaptive Noise Spectral Shaping

Yuma Koizumi; Heiga Zen; Kohei Yatabe; Nanxin Chen and; Michiel Bacchiani

arXiv:2203.16749·eess.AS·August 8, 2022

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen and, Michiel Bacchiani

PDF

Open Access 2 Models

TL;DR

SpecGrad introduces an adaptive noise spectral shaping method for diffusion-based neural vocoders, significantly enhancing high-frequency sound quality without increasing computational costs.

Contribution

It proposes a novel adaptive noise spectral shaping technique that aligns the diffusion noise with the spectral envelope of acoustic features in neural vocoders.

Findings

01

Higher-fidelity speech synthesis compared to conventional DDPM vocoders

02

Effective in analysis-synthesis and speech enhancement scenarios

03

Maintains computational efficiency similar to existing models

Abstract

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsDiffusion