TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

Tong Wu; Junzhe Shen; Zixia Jia; Yuxuan Wang; Zilong Zheng

arXiv:2502.18890·cs.CL·July 10, 2025

TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

Tong Wu, Junzhe Shen, Zixia Jia, Yuxuan Wang, Zilong Zheng

PDF

Open Access 1 Repo 1 Video

TL;DR

TokenSwift is a novel framework that significantly accelerates ultra-long sequence generation in large language models, achieving over three times speedup while preserving model quality, enabling practical generation of sequences up to 100K tokens.

Contribution

We introduce TokenSwift, a new method that overcomes key challenges in ultra-long sequence generation, providing scalable and lossless acceleration for large language models.

Findings

01

Achieves over 3x speedup across various models and architectures.

02

Maintains the quality of generated sequences despite acceleration.

03

Enables practical ultra-long sequence generation up to 100K tokens.

Abstract

Generating ultra-long sequences with large language models (LLMs) has become increasingly crucial but remains a highly time-intensive task, particularly for sequences up to 100K tokens. While traditional speculative decoding methods exist, simply extending their generation limits fails to accelerate the process and can be detrimental. Through an in-depth analysis, we identify three major challenges hindering efficient generation: frequent model reloading, dynamic key-value (KV) management and repetitive generation. To address these issues, we introduce TOKENSWIFT, a novel framework designed to substantially accelerate the generation process of ultra-long sequences while maintaining the target model's inherent quality. Experimental results demonstrate that TOKENSWIFT achieves over 3 times speedup across models of varying scales (1.5B, 7B, 8B, 14B) and architectures (MHA, GQA). This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigai-nlco/tokenswift
pytorchOfficial

Videos

TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis