Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent Achieves 4x Lower Cost Than NVIDIA L40S
Ranjith M. S., Akshat Mandloi, and Sudarshan Kamath

TL;DR
Lightning V2 is a hardware-optimized, low-precision TTS model that reduces inference costs by 4x compared to NVIDIA L40S without sacrificing audio quality.
Contribution
The paper introduces Lightning V2, a TTS model co-optimized with Tenstorrent hardware, achieving high fidelity with low-precision computation and significantly lower costs.
Findings
Over 95% LoFi computational fidelity achieved
More than 80% BlockFloat8 deployment without quality loss
Approximately 4x lower cost than NVIDIA L40S at similar throughput
Abstract
Text-to-Speech (TTS) models are significantly more numerically fragile than Large Language Models (LLMs) due to their continuous waveform generation and perceptual sensitivity to small numerical perturbations. While aggressive precision reduction techniques such as BlockFloat8 (BFP8) and low-fidelity (LoFi) compute have been widely adopted in language models, applying similar strategies to TTS systems often results in audible artifacts, phase instability, and spectral distortion. In this work, we present Lightning V2, a production-grade TTS model co-optimized for Tenstorrent hardware. Through precision-aware architectural design and hardware-software co-optimization, we achieve over 95% LoFi computational fidelity and more than 80% BlockFloat8 deployment without measurable degradation in audio quality. Leveraging Tenstorrent's Network-on-Chip (NoC), distributed SRAM, and deterministic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
