Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

Ravichander Vipperla; Sangjun Park; Kihyun Choo; Samin Ishtiaq,; Kyoungbo Min; Sourav Bhattacharya; Abhinav Mehrotra; Alberto Gil C. P. Ramos; and Nicholas D. Lane

arXiv:2008.04574·eess.AS·August 12, 2020

Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq,, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos, and Nicholas D. Lane

PDF

TL;DR

This paper introduces two techniques, sample-bunching and bit-bunching, to significantly reduce the computational complexity of LPCNet vocoder for low-cost neural TTS systems, enabling efficient mobile deployment.

Contribution

The paper presents novel bunching techniques that enhance LPCNet's efficiency, making neural TTS more accessible on low-resource devices.

Findings

01

2.19x faster runtime on mobile devices

02

Less than 0.1 decrease in TTS MOS

03

Effective complexity reduction techniques

Abstract

LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.