Restoring degraded speech via a modified diffusion model

Jianwei Zhang; Suren Jayasuriya; Visar Berisha

arXiv:2104.11347·cs.SD·September 3, 2021

Restoring degraded speech via a modified diffusion model

Jianwei Zhang, Suren Jayasuriya, Visar Berisha

PDF

TL;DR

This paper introduces a modified diffusion model-based neural network to effectively restore degraded speech signals, improving quality across various types of degradation with better perceptual metrics and subjective assessments.

Contribution

The paper presents a novel modification to the DiffWave model, replacing its mel-spectrum upsampler with a deep CNN to enhance speech restoration from degraded inputs.

Findings

01

Improved speech quality on LPC-10 and AMR-NB compressed speech.

02

Enhanced perceptual metrics and subjective quality scores.

03

Better performance in out-of-corpus evaluations.

Abstract

There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave model, that aims to restore the original speech signal. DiffWave, a recently published diffusion-based vocoder, has shown state-of-the-art synthesized speech quality and relatively shorter waveform generation times, with only a small set of parameters. We replace the mel-spectrum upsampler in DiffWave with a deep CNN upsampler, which is trained to alter the degraded speech mel-spectrum to match that of the original speech. The model is trained using the original speech waveform, but conditioned on the degraded speech mel-spectrum. Post-training, only the degraded mel-spectrum is used as input and the model generates an estimate of the original speech. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.