Online Continual Learning for Time Series: a Natural Score-driven Approach
Edoardo Urettini, Daniele Atzeni, Ioanna-Yvonni Tsaknaki, Antonio Carta

TL;DR
This paper introduces NatSR, a novel online continual learning method for time series forecasting that employs a score-driven approach with robustness to outliers, demonstrating superior performance over existing methods.
Contribution
It develops a natural score-driven optimizer with robustness features and integrates it into an online learning framework for time series forecasting, enhancing adaptation and accuracy.
Findings
NatSR outperforms state-of-the-art methods in forecasting accuracy.
The score-driven approach provides robustness to outliers.
The method improves adaptation during regime shifts.
Abstract
Online continual learning (OCL) methods adapt to changing environments without forgetting past knowledge. Similarly, online time series forecasting (OTSF) is a real-world problem where data evolve in time and success depends on both rapid adaptation and long-term memory. Indeed, time-varying and regime-switching forecasting models have been extensively studied, offering a strong justification for the use of OCL in these settings. Building on recent work that applies OCL to OTSF, this paper aims to strengthen the theoretical and practical connections between time series methods and OCL. First, we reframe neural network optimization as a parameter filtering problem, showing that natural gradient descent is a score-driven method and proving its information-theoretic optimality. Then, we show that using a Student's t likelihood in addition to natural gradient induces a bounded update, which…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The connection between natural gradient descent and score-driven models is interesting and seems new in time series forecasting literature. 2. Consistent gains over strong baselines on several datasets; the ablation study supports the claims. 3. Sample code is provided to help the reviewer verify the results independently.
1. The writing could be improved. For example, the main algorithm, which, from my perspective, is one of the most important parts of this work, is deferred to the appendix. Moreover, it would be better to clearly include a separate section or use bullet points to state the major contributions. 2. The numerical experiments are limited to end-to-end trained models. The current experimental settings largely follow Pham et al. (2023) and Wen et al. (2023). The small models are end-to-end trained on
1. The paper tackles an important and practical OCL setting in time-series prediction—requiring high plasticity to adapt while maintaining high stability to avoid forgetting. 2. The analysis of natural gradient descent from a score-driven filtering perspective is reasonable and original. 3. The proposed method is simple and useful, combining natural gradients, replay, and a dynamic scale heuristic in a coherent recipe.
1. Baselines are limited to rehearsal-style OCL. There are additional families relevant to OCL in time series: Error-feedback methods (e.g., online model adaptation, Extended Kalman Filter) and feedforward adaptation approaches that pair memory buffers with sample selection and tailored optimization [1]. Regularization-based continual learning methods to mitigate catastrophic forgetting [2]. Including these baselines (or carefully justifying their omission) would strengthen the empirical 2. The
**Originality:** The conceptual link between natural gradient descent and score-driven models is a significant and novel insight. It provides a fresh, unifying perspective on online optimization. **Clarity:** Despite the technical density in some sections, the core ideas are effectively communicated. The abstract and introduction succinctly outline the contributions, and the experiments are clearly described. **Significance:** The paper tackles a fundamental problem (online learning in non-sta
**Presentation:** Some parts, particularly in Sections 4.2 and 4.3, are quite dense and could be challenging for a reader not deeply familiar with both information geometry and score-driven models. A little more high-level intuition before diving into the equations would improve accessibility. The pseudocode in Appendix D is helpful but could be more clearly annotated. **Performance on High-Dynamics Datasets:** The method underperforms on the ECL and Traffic datasets, which the authors attribut
Videos
Taxonomy
TopicsData Stream Mining Techniques · Traffic Prediction and Management Techniques · Domain Adaptation and Few-Shot Learning
