Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust   Waveform Generation

Yang Ai; Haoyu Li; Xin Wang; Junichi Yamagishi; Zhenhua Ling

arXiv:2011.03955·cs.SD·November 10, 2020

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation

Yang Ai, Haoyu Li, Xin Wang, Junichi Yamagishi, Zhenhua Ling

PDF

Open Access

TL;DR

This paper introduces a hierarchical neural vocoder that effectively denoises and dereverberates audio features to produce clean speech waveforms, improving over previous vocoders and competing well with speech enhancement techniques.

Contribution

The paper proposes a novel DNR-HiNet vocoder with a modified amplitude spectrum predictor that jointly predicts clean spectra, noise, and reverberation, enhancing speech quality from degraded inputs.

Findings

01

Outperforms original HiNet vocoder in denoising and dereverberation tasks.

02

Achieves competitive results with advanced speech enhancement methods.

03

Incorporates bandwidth and frequency resolution extension models for better spectral prediction.

Abstract

This paper presents a denoising and dereverberation hierarchical neural vocoder (DNR-HiNet) to convert noisy and reverberant acoustic features into a clean speech waveform. We implement it mainly by modifying the amplitude spectrum predictor (ASP) in the original HiNet vocoder. This modified denoising and dereverberation ASP (DNR-ASP) can predict clean log amplitude spectra (LAS) from input degraded acoustic features. To achieve this, the DNR-ASP first predicts the noisy and reverberant LAS, noise LAS related to the noise information, and room impulse response related to the reverberation information then performs initial denoising and dereverberation. The initial processed LAS are then enhanced by another neural network as the final clean LAS. To further improve the quality of the generated clean LAS, we also introduce a bandwidth extension model and frequency resolution extension…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques