TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference

Jiyoung Park; Hankyu Jang; Changseok Song; Wookeun Jung

arXiv:2602.05145·cs.LG·February 6, 2026

TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference

Jiyoung Park, Hankyu Jang, Changseok Song, Wookeun Jung

PDF

Open Access

TL;DR

TIDE is a novel framework that enhances large language model inference speed by integrating online draft adaptation, leveraging hidden states for zero-overhead updates, and optimizing resource use across heterogeneous GPU clusters.

Contribution

TIDE introduces a serving-engine-native approach for real-time draft adaptation in LLM inference, eliminating model reloads and improving throughput and training efficiency.

Findings

01

Up to 1.15x throughput improvement over static speculative decoding.

02

Reduces draft training time by 1.67x compared to recomputation methods.

03

Effective across diverse real-world workloads.

Abstract

Speculative decoding can substantially accelerate LLM inference, but realizing its benefits in practice is challenging due to evolving workloads and system-level constraints. We present TIDE (Temporal Incremental Draft Engine), a serving-engine-native framework that integrates online draft adaptation directly into high-performance LLM inference systems. TIDE reuses target model hidden states generated during inference as training signals, enabling zero-overhead draft adaptation without reloading the target model, and employs adaptive runtime control to activate speculation and training only when beneficial. TIDE exploits heterogeneous clusters by mapping decoupled inference and training to appropriate GPU classes. Across diverse real-world workloads, TIDE achieves up to 1.15x throughput improvement over static speculative decoding while reducing draft training time by 1.67x compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis