LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan,, Zhiyong Wu

TL;DR
LightGrad is a lightweight diffusion probabilistic model designed for text-to-speech applications on edge devices, significantly reducing model size and inference latency while maintaining speech quality.
Contribution
The paper introduces LightGrad, a novel lightweight DPM with a fast sampling technique and streaming inference, optimized for resource-constrained TTS applications.
Findings
62.2% reduction in model parameters
65.7% reduction in inference latency
Maintains comparable speech quality with fewer denoising steps
Abstract
Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
