Neural Speech Synthesis on a Shoestring: Improving the Efficiency of   LPCNet

Jean-Marc Valin; Umut Isik; Paris Smaragdis; Arvindh Krishnaswamy

arXiv:2202.11169·eess.AS·February 24, 2022

Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

PDF

Open Access 2 Repos

TL;DR

This paper enhances LPCNet, a neural speech synthesis model, making it more efficient and capable of real-time operation on various devices by improving both algorithmic and computational aspects.

Contribution

It introduces algorithmic and computational improvements to LPCNet, enabling real-time neural speech synthesis on a wide range of devices.

Findings

01

2.5x faster synthesis speed

02

Maintains or improves speech quality

03

Operates on most smartphones and embedded devices

Abstract

Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so. In previous work, we introduced LPCNet, which uses linear prediction to significantly reduce the complexity of neural synthesis. In this work, we further improve the efficiency of LPCNet -- targeting both algorithmic and computational improvements -- to make it usable on a wide variety of devices. We demonstrate an improvement in synthesis quality while operating 2.5x faster. The resulting open-source LPCNet algorithm can perform real-time neural synthesis on most existing phones and is even usable in some embedded devices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing