SqueezeWave: Extremely Lightweight Vocoders for On-device Speech   Synthesis

Bohan Zhai; Tianren Gao; Flora Xue; Daniel Rothchild; Bichen Wu,; Joseph E. Gonzalez; Kurt Keutzer

arXiv:2001.05685·cs.SD·January 17, 2020·21 cites

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu,, Joseph E. Gonzalez, Kurt Keutzer

PDF

Open Access 1 Repo

TL;DR

SqueezeWave introduces a highly efficient, lightweight vocoder based on WaveGlow that achieves near-parallel quality speech synthesis suitable for real-time on-edge devices, with significantly reduced computational cost.

Contribution

The paper presents SqueezeWave, a novel lightweight vocoder that maintains high-quality speech synthesis while drastically reducing MACs compared to WaveGlow.

Findings

01

SqueezeWave reduces MACs by 61x to 214x compared to WaveGlow.

02

SqueezeWave achieves similar audio quality to WaveGlow.

03

The model is suitable for real-time on-device speech synthesis.

Abstract

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tianrengao/SqueezeWave
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and dialogue systems