Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon, Sang-Hoon Lee, Hyeong-Rae Noh, Seong-Whan Lee

TL;DR
This paper introduces audio dequantization techniques in flow-based neural vocoders, demonstrating that they enhance the quality of generated speech by reducing artifacts and improving harmonic structure, thus achieving higher fidelity audio.
Contribution
It is the first to adapt and evaluate data dequantization methods from image generation for high fidelity audio synthesis in flow-based neural vocoders.
Findings
Audio dequantization improves waveform quality.
Dequantization reduces digital artifacts in generated audio.
Enhanced harmonic structure in synthesized speech.
Abstract
In recent works, a flow-based neural vocoder has shown significant improvement in real-time speech generation task. The sequence of invertible flow operations allows the model to convert samples from simple distribution to audio samples. However, training a continuous density model on discrete audio data can degrade model performance due to the topological difference between latent and actual distribution. To resolve this problem, we propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation. Data dequantization is a well-known method in image generation but has not yet been studied in the audio domain. For this reason, we implement various audio dequantization methods in flow-based neural vocoder and investigate the effect on the generated audio. We conduct various objective performance assessments and subjective evaluation to show that audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
