FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Manh Luong, Viet Anh Tran

TL;DR
FlowVocoder is a compact neural vocoder based on normalizing flows that achieves high-quality speech synthesis in real-time with a small memory footprint, suitable for edge devices.
Contribution
It introduces a novel flow-based neural vocoder with improved density estimation using mixture CDFs, enabling high-fidelity, real-time speech synthesis with fewer parameters.
Findings
Achieves competitive speech quality with baseline methods.
Has a significantly smaller memory footprint than WaveFlow.
Suitable for real-time text-to-speech applications.
Abstract
Recently, autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, autoregressive neural vocoders such as WaveFlow are capable of modeling waveform signals from mel-spectrogram, its number of parameters is significant to deploy on edge devices. Though NanoFlow, which has a small number of parameters, is a state-of-the-art autoregressive neural vocoder, the performance of NanoFlow is marginally lower than WaveFlow. Therefore, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is capable of generating high-fidelity audio in real-time. Our proposed model improves the density estimation of flow blocks by utilizing a mixture of Cumulative Distribution Functions (CDF) for bipartite transformation. Hence, the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsNormalizing Flows · Affine Coupling · Invertible 1x1 Convolution · WaveGlow
