Distilling Time Series Foundation Models for Efficient Forecasting
Yuqi Li, Kuiye Ding, Chuanguang Yang, Szu-Yu Chen, Yingli Tian

TL;DR
This paper introduces DistilTS, a novel distillation framework tailored for time series foundation models, enabling significant model compression and faster inference while maintaining high forecasting accuracy.
Contribution
DistilTS is the first distillation method specifically designed for TSFMs, addressing task difficulty and architecture discrepancies to produce compact, efficient models.
Findings
Achieves comparable forecasting performance to full-sized models.
Reduces model parameters by up to 150 times.
Speeds up inference by up to 6000 times.
Abstract
Time Series foundation models (TSFMs) deliver strong forecasting performance through large-scale pretraining, but their large parameter sizes make deployment costly. While knowledge distillation offers a natural and effective approach for model compression, techniques developed for general machine learning tasks are not directly applicable to time series forecasting due to the unique characteristics. To address this, we present DistilTS, the first distillation framework specifically designed for TSFMs. DistilTS addresses two key challenges: (1) task difficulty discrepancy, specific to forecasting, where uniform weighting makes optimization dominated by easier short-term horizons, while long-term horizons receive weaker supervision; and (2) architecture discrepancy, a general challenge in distillation, for which we design an alignment mechanism in the time series forecasting. To overcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Time Series Analysis and Forecasting · Traffic Prediction and Management Techniques
