Accelerating Time Series Foundation Models with Speculative Decoding
Pranav Subbaraman, Fang Sun, Yue Yao, Huacong Tang, Xiao Luo, Yizhou Sun

TL;DR
This paper introduces a speculative decoding framework that accelerates large Transformer-based time-series models by using a smaller draft model to propose predictions, verified by a larger target model, achieving faster inference without model modifications.
Contribution
The authors adapt speculative decoding to continuous time-series data, enabling faster inference for large models without architectural changes, suitable for web applications.
Findings
Significant speedups in inference time on benchmark datasets.
Maintains competitive accuracy compared to standard models.
No architectural modifications needed for existing models.
Abstract
Modern web applications--from real-time content recommendation and dynamic pricing to CDN optimization--increasingly rely on time-series forecasting to deliver personalized experiences to billions of users. Large-scale Transformer-based models have achieved state-of-the-art performance in time-series forecasting but suffer from high computational costs, limiting their deployment in latency-sensitive web applications. To address this challenge, we propose a general inference acceleration framework that adapts speculative decoding to autoregressive time-series models. Our approach employs a smaller "draft" model to propose future time-series patches, which are then verified in parallel by a larger "target" model, reducing the number of sequential forward passes required. We address key technical challenges in adapting this technique from discrete language tokens to continuous time-series…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Machine Learning in Healthcare · Traffic Prediction and Management Techniques
