HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong, Jaehyeon Kim, Jaekyoung Bae

TL;DR
HiFi-GAN is a novel GAN-based speech synthesis model that achieves high-fidelity, efficient, and real-time audio generation, outperforming previous GAN methods and approaching autoregressive quality.
Contribution
This work introduces HiFi-GAN, a GAN architecture that models periodic speech signals to produce high-quality, high-fidelity speech efficiently and in real-time, with broad applicability.
Findings
Achieves 22.05 kHz high-fidelity speech 167.9x faster than real-time.
Demonstrates high similarity to human speech quality in subjective evaluations.
Generalizes well to unseen speakers and end-to-end speech synthesis.
Abstract
Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗jaketae/hifigan-lj-v1model· 11 dl· ♡ 111 dl♡ 1
- 🤗speechbrain/tts-hifigan-ljspeechmodel· 786 dl· ♡ 39786 dl♡ 39
- 🤗nvidia/tts_hifiganmodel· 192 dl· ♡ 38192 dl♡ 38
- 🤗speechbrain/tts-hifigan-libritts-16kHzmodel· 331 dl· ♡ 8331 dl♡ 8
- 🤗speechbrain/tts-hifigan-libritts-22050Hzmodel· 34k dl· ♡ 634k dl♡ 6
- 🤗padmalcom/tts-hifigan-germanmodel· 1 dl· ♡ 31 dl♡ 3
- 🤗infinisoft/ttsmodel· ♡ 4♡ 4
- 🤗Bilgilice/bilgilice35model
- 🤗Pendrokar/xvasynth_lojbanmodel· ♡ 1♡ 1
- 🤗Nick256/tts-hifigan-commonvoice-single-femalemodel· 4 dl4 dl
Videos
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsHiFi-GAN
