BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

TL;DR
BigVGAN is a large-scale, universal neural vocoder that leverages innovative architectural features and extensive training to produce high-fidelity, out-of-distribution audio across diverse speakers and environments without fine-tuning.
Contribution
The paper introduces BigVGAN, a novel GAN-based vocoder trained at an unprecedented scale with new inductive biases, enabling robust zero-shot audio synthesis across various out-of-distribution scenarios.
Findings
Achieves state-of-the-art zero-shot performance on diverse audio tasks.
Successfully generalizes to unseen speakers, languages, and recording conditions.
Maintains high audio quality without over-regularization.
Abstract
Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments. In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution scenarios without fine-tuning. We introduce periodic activation function and anti-aliased representation into the GAN generator, which brings the desired inductive bias for audio synthesis and significantly improves audio quality. In addition, we train our GAN vocoder at the largest scale up to 112M parameters, which is unprecedented in the literature. We identify and address the failure modes in large-scale GAN training for audio, while maintaining high-fidelity output without over-regularization. Our BigVGAN, trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗xihan123/so-vits-svc-5.0-ninemodel· ♡ 5♡ 5
- 🤗nvidia/bigvgan_v2_24khz_100band_256xmodel· 13k dl· ♡ 1913k dl♡ 19
- 🤗nvidia/bigvgan_v2_22khz_80band_256xmodel· 1.2M dl· ♡ 261.2M dl♡ 26
- 🤗nvidia/bigvgan_v2_22khz_80band_fmax8k_256xmodel· 619 dl· ♡ 2619 dl♡ 2
- 🤗nvidia/bigvgan_v2_44khz_128band_256xmodel· 348 dl· ♡ 7348 dl♡ 7
- 🤗nvidia/bigvgan_v2_44khz_128band_512xmodel· 690k dl· ♡ 68690k dl♡ 68
- 🤗nvidia/bigvgan_22khz_80bandmodel· 234 dl· ♡ 1234 dl♡ 1
- 🤗nvidia/bigvgan_24khz_100bandmodel· 809 dl· ♡ 4809 dl♡ 4
- 🤗nvidia/bigvgan_base_22khz_80bandmodel· 938 dl938 dl
- 🤗nvidia/bigvgan_base_24khz_100bandmodel· 86 dl86 dl
Videos
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
