Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis

Th\'eodor Lemerle; T\'eo Guichoux; Axel Roebel; Nicolas Obin

arXiv:2410.23320·eess.AS·November 18, 2025

Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis

Th\'eodor Lemerle, T\'eo Guichoux, Axel Roebel, Nicolas Obin

PDF

Open Access 1 Repo 1 Video

TL;DR

Lina-Speech introduces Gated Linear Attention and Initial-State Tuning to enhance multi-sample prompt-based TTS, enabling better voice cloning, style, and emotion adaptation with improved inference efficiency.

Contribution

The paper presents a novel TTS model using Gated Linear Attention and a stateful tuning strategy for flexible, efficient voice cloning and style transfer from multiple speech samples.

Findings

01

Improved inference throughput with Gated Linear Attention.

02

Effective multi-sample conditioning for voice cloning.

03

Enhanced control over prosody and emotion.

Abstract

Neural codec language models, built on transformer architecture, have revolutionized text-to-speech (TTS) synthesis, excelling in voice cloning by treating it as a prefix continuation task. However, their limited context length hinders their effectiveness to short speech samples. As a result, the voice cloning ability is restricted to a limited coverage and diversity of the speaker's prosody and style. Besides, adapting prosody, accent, or appropriate emotion from a short prefix remains a challenging task. Finally, the quadratic complexity of self-attention limits inference throughput. In this work, we introduce Lina-Speech, a TTS model with Gated Linear Attention (GLA) to replace standard self-attention as a principled backbone, improving inference throughput while matching state-of-the-art performance. Leveraging the stateful property of recurrent architecture, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theodorblackbird/lina-speech
pytorchOfficial

Videos

Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems

MethodsSoftmax · Attention Is All You Need