Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent Achieves 4x Lower Cost Than NVIDIA L40S

Ranjith M. S.; Akshat Mandloi; and Sudarshan Kamath

arXiv:2604.03279·eess.AS·April 8, 2026

Rewriting TTS Inference Economics: Lightning V2 on Tenstorrent Achieves 4x Lower Cost Than NVIDIA L40S

Ranjith M. S., Akshat Mandloi, and Sudarshan Kamath

PDF

TL;DR

Lightning V2 is a hardware-optimized, low-precision TTS model that reduces inference costs by 4x compared to NVIDIA L40S without sacrificing audio quality.

Contribution

The paper introduces Lightning V2, a TTS model co-optimized with Tenstorrent hardware, achieving high fidelity with low-precision computation and significantly lower costs.

Findings

01

Over 95% LoFi computational fidelity achieved

02

More than 80% BlockFloat8 deployment without quality loss

03

Approximately 4x lower cost than NVIDIA L40S at similar throughput

Abstract

Text-to-Speech (TTS) models are significantly more numerically fragile than Large Language Models (LLMs) due to their continuous waveform generation and perceptual sensitivity to small numerical perturbations. While aggressive precision reduction techniques such as BlockFloat8 (BFP8) and low-fidelity (LoFi) compute have been widely adopted in language models, applying similar strategies to TTS systems often results in audible artifacts, phase instability, and spectral distortion. In this work, we present Lightning V2, a production-grade TTS model co-optimized for Tenstorrent hardware. Through precision-aware architectural design and hardware-software co-optimization, we achieve over 95% LoFi computational fidelity and more than 80% BlockFloat8 deployment without measurable degradation in audio quality. Leveraging Tenstorrent's Network-on-Chip (NoC), distributed SRAM, and deterministic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.