HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech   Deep Features in Adversarial Networks

Jiaqi Su; Zeyu Jin; Adam Finkelstein

arXiv:2006.05694·eess.AS·September 23, 2020·19 cites

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Jiaqi Su, Zeyu Jin, Adam Finkelstein

PDF

Open Access 1 Repo

TL;DR

This paper presents HiFi-GAN, a deep learning model that enhances degraded speech recordings by reducing noise and reverberation, achieving high perceptual quality and generalization across speakers and environments.

Contribution

Introduces HiFi-GAN, a novel end-to-end adversarial network utilizing multi-scale discriminators and deep feature matching for high-fidelity speech enhancement.

Findings

01

Outperforms state-of-the-art methods in objective metrics

02

Achieves superior subjective perceptual quality

03

Generalizes well to unseen speakers and environments

Abstract

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rishikksh20/hifigan-denoiser
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsMixture of Logistic Distributions · Dilated Causal Convolution · WaveNet