The Forecast After the Forecast: A Post-Processing Shift in Time Series
Daojun Liang, Qi Li, Yinglong Wang, Jing Chen, Hu Zhang, Xiaoxiao Cui, Qizheng Wang, Shuo Li

TL;DR
This paper introduces $oldsymbol{ extdelta}$-Adapter, a lightweight post-processing method that enhances existing time series forecasting models by improving accuracy, interpretability, and uncertainty calibration without retraining or modifying the original models.
Contribution
The paper presents $oldsymbol{ extdelta}$-Adapter, a novel, architecture-agnostic post-processing technique for time series forecasting that boosts performance and interpretability without retraining models.
Findings
$oldsymbol{ extdelta}$-Adapter improves forecasting accuracy across diverse models and datasets.
It enhances uncertainty calibration with minimal computational overhead.
The method offers interpretability through feature selection and feature masking.
Abstract
Time series forecasting has long been dominated by advances in model architecture, with recent progress driven by deep learning and hybrid statistical techniques. However, as forecasting models approach diminishing returns in accuracy, a critical yet underexplored opportunity emerges: the strategic use of post-processing. In this paper, we address the last-mile gap in time-series forecasting, which is to improve accuracy and uncertainty without retraining or modifying a deployed backbone. We propose -Adapter, a lightweight, architecture-agnostic way to boost deployed time series forecasters without retraining. -Adapter learns tiny, bounded modules at two interfaces: input nudging (soft edits to covariates) and output residual correction. We provide local descent guarantees, drift bounds, and compositional stability for combined adapters. Meanwhile, it can act…
Peer Reviews
Decision·ICLR 2026 Poster
1. **Practicality and Novelty**: The paper addresses the critical real-world problem of updating deployed models in a computationally cheap manner. The δ-Adapter is a practical "plug-and-play" solution compared to costly alternatives like full retraining or fine-tuning. 2. **Methodological Completeness**: The framework is versatile, being applicable to different backbone forecasting model. It is also comprehensive, addressing crucial aspects like interpretability (feature selection) and relia
1. **Limited Comparison with Alternative Post-Processing Methods:** While the paper positions δ-Adapter as a post-processing technique, the primary comparisons are against fine-tuning or continue-training the backbone model. However, there is a significant body of work on post-processing and test-time adaptation for time series forecasting designed to handle concept or distribution shifts. This includes methods for both batch-training settings (e.g., [SOLID](https://arxiv.org/abs/2310.14838)、[T
The main strengths of the paper include, 1. The post-processing approach is a well-motivated solution for real-world deployments where retraining large models is costly. The ability to improve frozen forecasters is important and useful problem to tackle. 2. The paper provides rigorous theoretical analysis for several algorithms: Local descent guarantees (Theorems 2 & 3), Compositional stability for combined adapters (Proposition 3.2) etc. 3. The authors compared against diverse backbones (DistP
The main weakness of the paper include, 1. The proposed Input/output adapters approach is conceptually similar to existing adapter methods in NLP. The theoretical results (gradient descent, Lipschitz stability) are relatively standard. The main contribution does not appear to be novel as the claims in the paper and looks like an application of the NLP concept to time series. 2. Some important ablation studies such as the impact of adapter architecture (MLP depth, width etc), effect of different
- Overall, I think this is a good paper. The presentation is clear and the theory is solid as far as I can tell. - The presented experiments consistently show the improvement of input and output adapters. - Benchmarks look reasonable within popular datasets from the time series literature.
- Some of the models are relatively old and would not be considered SOTA at the current time (e.g. Autoformer). I would suggest including models like PatchTST, Non-stationary Transformer, TimesNet or TimeMixer. - Additive vs multiplicative: - Reports metrics in Table 2 are averaged across lenghts. I understand the difficulty of reporting metrics across many horizons, but in my opinion, this makes it harder to interpret the effect over prediction lengths, over which errors may differ significan
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Time Series Analysis and Forecasting · Traffic Prediction and Management Techniques
