Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition

Zhengxi Liu; Yanmin Qian

arXiv:2106.13419·cs.SD·June 28, 2021·1 cites

Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition

Zhengxi Liu, Yanmin Qian

PDF

Open Access 1 Repo

TL;DR

Basis-MelGAN introduces a novel neural vocoder that decomposes audio with learned bases and weights, significantly reducing computational complexity while maintaining high audio quality, enabling more efficient real-time synthesis.

Contribution

The paper proposes Basis-MelGAN, a neural vocoder that simplifies upsampling layers by predicting basis weights instead of raw audio, reducing computational cost.

Findings

01

Achieves high-quality audio comparable to existing GAN vocoders.

02

Reduces GFLOPs from 17.74 to 7.95, improving efficiency.

03

Maintains audio quality while significantly lowering computational complexity.

Abstract

Recent studies have shown that neural vocoders based on generative adversarial network (GAN) can generate audios with high quality. While GAN based neural vocoders have shown to be computationally much more efficient than those based on autoregressive predictions, the real-time generation of the highest quality audio on CPU is still a very challenging task. One major computation of all GAN-based neural vocoders comes from the stacked upsampling layers, which were designed to match the length of the waveform's length of output and temporal resolution. Meanwhile, the computational complexity of upsampling networks is closely correlated with the numbers of samples generated for each window. To reduce the computation of upsampling layers, we propose a new GAN based neural vocoder called Basis-MelGAN where the raw audio samples are decomposed with a learned basis and their associated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xcmyz/FastVocoder
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies