LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Jie Chen; Xingchen Song; Zhendong Peng; Binbin Zhang; Fuping Pan,; Zhiyong Wu

arXiv:2308.16569·cs.SD·September 1, 2023

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan,, Zhiyong Wu

PDF

TL;DR

LightGrad is a lightweight diffusion probabilistic model designed for text-to-speech applications on edge devices, significantly reducing model size and inference latency while maintaining speech quality.

Contribution

The paper introduces LightGrad, a novel lightweight DPM with a fast sampling technique and streaming inference, optimized for resource-constrained TTS applications.

Findings

01

62.2% reduction in model parameters

02

65.7% reduction in inference latency

03

Maintains comparable speech quality with fewer denoising steps

Abstract

Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.