FloWaveNet : A Generative Flow for Raw Audio

Sungwon Kim; Sang-gil Lee; Jongyoon Song; Jaehyeon Kim; and Sungroh; Yoon

arXiv:1811.02155·cs.SD·May 21, 2019·30 cites

FloWaveNet : A Generative Flow for Raw Audio

Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, and Sungroh, Yoon

PDF

Open Access 2 Repos

TL;DR

FloWaveNet is a flow-based generative model for raw audio that enables real-time synthesis with a simple, single-stage training process, achieving high-quality sound comparable to more complex models.

Contribution

It introduces FloWaveNet, a novel flow-based model that simplifies training and inference for raw audio synthesis without auxiliary losses or two-stage training.

Findings

01

Real-time raw audio synthesis achieved with FloWaveNet

02

Single-stage training with maximum likelihood loss

03

Comparable audio quality to two-stage models

Abstract

Most modern text-to-speech architectures use a WaveNet vocoder for synthesizing high-fidelity waveform audio, but there have been limitations, such as high inference time, in its practical application due to its ancestral sampling scheme. The recently suggested Parallel WaveNet and ClariNet have achieved real-time audio synthesis capability by incorporating inverse autoregressive flow for parallel sampling. However, these approaches require a two-stage training pipeline with a well-trained teacher network and can only produce natural sound by using probability distillation along with auxiliary loss terms. We propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis