BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Sang-gil Lee; Wei Ping; Boris Ginsburg; Bryan Catanzaro; Sungroh Yoon

arXiv:2206.04658·cs.SD·February 17, 2023·46 cites

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

PDF

Open Access 5 Repos 10 Models 1 Video

TL;DR

BigVGAN is a large-scale, universal neural vocoder that leverages innovative architectural features and extensive training to produce high-fidelity, out-of-distribution audio across diverse speakers and environments without fine-tuning.

Contribution

The paper introduces BigVGAN, a novel GAN-based vocoder trained at an unprecedented scale with new inductive biases, enabling robust zero-shot audio synthesis across various out-of-distribution scenarios.

Findings

01

Achieves state-of-the-art zero-shot performance on diverse audio tasks.

02

Successfully generalizes to unseen speakers, languages, and recording conditions.

03

Maintains high audio quality without over-regularization.

Abstract

Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments. In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution scenarios without fine-tuning. We introduce periodic activation function and anti-aliased representation into the GAN generator, which brings the desired inductive bias for audio synthesis and significantly improves audio quality. In addition, we train our GAN vocoder at the largest scale up to 112M parameters, which is unprecedented in the literature. We identify and address the failure modes in large-scale GAN training for audio, while maintaining high-fidelity output without over-regularization. Our BigVGAN, trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

BigVGAN: A Universal Neural Vocoder with Large-Scale Training· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research