Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud   to Edge

Sangjun Park; Kihyun Choo; Joohyung Lee; Anton V. Porov; Konstantin; Osipov; June Sig Sung

arXiv:2203.14416·eess.AS·July 1, 2022

Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge

Sangjun Park, Kihyun Choo, Joohyung Lee, Anton V. Porov, Konstantin, Osipov, June Sig Sung

PDF

Open Access

TL;DR

This paper introduces Bunched LPCNet2, an efficient neural vocoder that delivers high-quality speech synthesis on both cloud and edge devices, optimizing for low complexity, small footprint, and real-time performance.

Contribution

It proposes an improved LPCNet architecture with a logistic distribution and dual-rate design, achieving high speech quality with minimal model size and computational cost.

Findings

01

Achieves 1.1MB model size with satisfactory speech quality

02

Operates faster than real-time on Raspberry Pi 3B

03

Maintains high speech quality with reduced model footprint

Abstract

Text-to-Speech (TTS) services that run on edge devices have many advantages compared to cloud TTS, e.g., latency and privacy issues. However, neural vocoders with a low complexity and small model footprint inevitably generate annoying sounds. This study proposes a Bunched LPCNet2, an improved LPCNet architecture that provides highly efficient performance in high-quality for cloud servers and in a low-complexity for low-resource edge devices. Single logistic distribution achieves computational efficiency, and insightful tricks reduce the model footprint while maintaining speech quality. A DualRate architecture, which generates a lower sampling rate from a prosody model, is also proposed to reduce maintenance costs. The experiments demonstrate that Bunched LPCNet2 generates satisfactory speech quality with a model footprint of 1.1MB while operating faster than real-time on a RPi 3B. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing