WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Yi Wang, Yi Si

TL;DR
WOLONet is a novel lightweight neural vocoder inspired by Vision Outlooker, achieving high-quality speech synthesis with fewer parameters than state-of-the-art models like HiFiGAN and UnivNet.
Contribution
Introduces WOLONet, a lightweight neural vocoder with a novel dynamic convolutional block inspired by Vision Outlooker, improving quality and efficiency.
Findings
WOLONet outperforms SOTA vocoders in quality with fewer parameters.
Ablation study confirms the effectiveness of the novel design.
Subjective and objective evaluations show superior synthesis quality.
Abstract
Recently, GAN-based neural vocoders such as Parallel WaveGAN, MelGAN, HiFiGAN, and UnivNet have become popular due to their lightweight and parallel structure, resulting in a real-time synthesized waveform with high fidelity, even on a CPU. HiFiGAN and UnivNet are two SOTA vocoders. Despite their high quality, there is still room for improvement. In this paper, motivated by the structure of Vision Outlooker from computer vision, we adopt a similar idea and propose an effective and lightweight neural vocoder called WOLONet. In this network, we develop a novel lightweight block that uses a location-variable, channel-independent, and depthwise dynamic convolutional kernel with sinusoidally activated dynamic kernel weights. To demonstrate the effectiveness and generalizability of our method, we perform an ablation study to verify our novel design and make a subjective and objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
Methods1x1 Convolution · Grouped Convolution · Weight Normalization · GAN Hinge Loss · *Communicated@Fast*How Do I Communicate to Expedia? · Window-based Discriminator · HuMan(Expedia)||How do I get a human at Expedia? · Tanh Activation · Dilated Convolution · Phase Shuffle
