SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao, Weng, Zhiyong Wu, Helen Meng

TL;DR
SnakeGAN is a novel GAN-based universal vocoder that leverages DDSP prior knowledge and periodic inductive biases to generate high-fidelity audio across diverse out-of-domain scenarios, including unseen speakers, styles, and musical pieces.
Contribution
It introduces a new GAN architecture with Snake activations and anti-aliased representations, enhancing generalization to out-of-domain audio synthesis tasks.
Findings
Outperforms existing methods in subjective and objective metrics
Successfully synthesizes high-fidelity audio for unseen speakers and styles
Effective in non-speech vocalization and musical audio synthesis
Abstract
Generative adversarial network (GAN)-based neural vocoders have been widely used in audio synthesis tasks due to their high generation quality, efficient inference, and small computation footprint. However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces. In this work, we propose SnakeGAN, a GAN-based universal vocoder, which can synthesize high-fidelity audio in various OOD scenarios. SnakeGAN takes a coarse-grained signal generated by a differentiable digital signal processing (DDSP) model as prior knowledge, aiming at recovering high-fidelity waveform from a Mel-spectrogram. We introduce periodic nonlinearities through the Snake activation function and anti-aliased representation into the generator, which further brings the desired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
