Contrastive Time Series Forecasting with Anomalies
Joel Ekstrand, Zahra Taghiyarrenani, Slawomir Nowaczyk

TL;DR
This paper introduces Co-TSFA, a novel regularization framework for time series forecasting that distinguishes between relevant and irrelevant anomalies, improving prediction accuracy in the presence of anomalies.
Contribution
Co-TSFA is the first method to explicitly model and differentiate forecast-relevant and irrelevant anomalies using contrastive learning and augmentation techniques.
Findings
Improves forecasting accuracy on benchmarks with anomalies.
Maintains normal data accuracy while handling anomalies.
Demonstrates effectiveness on real-world cash-demand data.
Abstract
Time series forecasting predicts future values from past data. In real-world settings, some anomalous events have lasting effects and influence the forecast, while others are short-lived and should be ignored. Standard forecasting models fail to make this distinction, often either overreacting to noise or missing persistent shifts. We propose Co-TSFA (Contrastive Time Series Forecasting with Anomalies), a regularization framework that learns when to ignore anomalies and when to respond. Co-TSFA generates input-only and input-output augmentations to model forecast-irrelevant and forecast-relevant anomalies, and introduces a latent-output alignment loss that ties representation changes to forecast changes. This encourages invariance to irrelevant perturbations while preserving sensitivity to meaningful distributional shifts. Experiments on the Traffic and Electricity benchmarks, as well…
Peer Reviews
Decision·Submitted to ICLR 2026
The proposed method is conceptually straightforward and easy to follow, and its overall technical soundness appears solid. It employs a targeted form of data augmentation to enhance robustness against anomalies and, more generally, to mitigate distribution shifts in time-series forecasting. The experimental evaluation covers multiple types of perturbations (as shown in Table 3), demonstrating the effectiveness and adaptability of Co-TSFA across diverse forecasting scenarios.
1. The overall evaluations of this study are not sufficiently strong to support the claims. (1) The experiments mainly rely on the ECL and Traffic datasets, which are known to be relatively stable with limited irregular or non-stationary patterns. However, the key motivation of this work is to address forecasting under “unclean” or anomalous conditions. Evaluations on more irregular datasets, such as those exhibiting significant trend shifts, sparsity, or frequent spikes, would provide stronger
1. Compared to RobustTSF, which focuses on point-wise anomalies and clean-test settings, Co-TSFA generalizes to both point-wise and continuous-wise anomalies and considers both test-clean and test-noisy settings. 2. The contrastive regularization framework that enforces latent–output alignment is novel and interesting. 3. The experiments are thorough and demonstrate consistent improvements across different anomaly settings (point-wise, continuous-wise, test-clean, and test-noisy).
1. The alignment loss shares a similar concept with [R1] in the context of learning with noisy labels. The authors are encouraged to discuss the differences between the two approaches. 2. The use of augmentations may increase training costs. It would be beneficial to include a training time analysis compared to other methods. 3. The stability of Co-TSFA requires further investigation. For example, it should be examined whether MAE/MSE suddenly increases at certain epochs during training, and w
The strong points of the paper are the following: - **Clarity:** The paper is easy to follow, and the main ideas are simply explained. - **Originality:** Although robustness has been a long-term issue in time series forecasting, focusing on anomaly types is a more recent research approach (RobustTSF was introduced in 2024), and definitely merging this with invariance learning is an interesting modeling aspect. - **Quality:** Different experimental setups are considered with respect to the presen
The weaknesses of the paper are the following: 1. **Poor positioning against related works (Clarity, Originality):** The authors do not sufficiently present related work in contrastive learning, e.g., recent soft contrastive learning for time series (Lee et al., 2023) (please see how extensive the discussion in the relevant paper is). It is hard to understand how the proposed method differs from existing methods in the formulation of the loss and the selection of augmentation for the contrasting
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Stream Mining Techniques · Forecasting Techniques and Applications
