OATS: Online Data Augmentation for Time Series Foundation Models
Junwei Deng, Chang Xu, Jiaqi W. Ma, Ming Jin, Chenghao Liu, Jiang Bian

TL;DR
OATS introduces a dynamic, sample-guided data augmentation method for Time Series Foundation Models, significantly improving performance by tailoring synthetic data generation to training stages using a diffusion-based approach.
Contribution
The paper presents OATS, a novel online data augmentation strategy that adaptively generates synthetic time series data conditioned on training samples and stages, outperforming static methods.
Findings
OATS consistently improves model performance over static augmentation methods.
The diffusion-based framework produces realistic and diverse synthetic time series.
OATS achieves substantial gains across multiple datasets and architectures.
Abstract
Time Series Foundation Models (TSFMs) are a powerful paradigm for time series analysis and are often enhanced by synthetic data augmentation to improve the training data quality. Existing augmentation methods, however, typically rely on heuristics and static paradigms. Motivated by dynamic data optimization, which shows that the contribution of samples varies across training stages, we propose OATS (Online Data Augmentation for Time Series Foundation Models), a principled strategy that generates synthetic data tailored to different training steps. OATS leverages valuable training samples as principled guiding signals and dynamically generates high-quality synthetic data conditioned on them. We further design a diffusion-based framework to produce realistic time series and introduce an explore-exploit mechanism to balance efficiency and effectiveness. Experiments on TSFMs demonstrate…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Dynamically augmenting time series according to evolving training signals is important and well motivated. 2. The proposed method of sample selection, a diffusion generator, and an explicit explore–exploit mechanism is new in this context. 3. Experiments on multiple time series datasets demonstrate the effectiveness of the proposed method over standard augmentation methods.
1. The compared data augmentation methods, such as mix-up augmentations and jittering, are basic and limited. Another type of time series augmentation methods is based on training generative models [1]. Additionally, there exists work that [2] proposes similar online data augmentation by selecting high-quality data. 2. Evaluation of time series foundation models is limited to 6 datasets. Consider adding more diverse, standardized time series foundation model benchmarks such as GIFT-EVAL to bett
1. Novel online augmentation formulation. The paper introduces a principled online data augmentation framework tailored to time series foundation models, replacing heuristic static methods with influence-based selection. The originality and coherence of this formulation represent a meaningful advance in dataset optimization for TSFMs. 2. Explore–exploit mechanism for computational balance. The explore–exploit design introduces a probabilistic scheduling approach that reuses cached influence sco
1. Unclear definition and maintenance of cached subsets. The paper assumes that the dataset is divided into L disjoint subsets and maintains exponentially moving averages of influence scores for each (Eq. 4), but the rationale for this partitioning is underexplained. There is limited discussion of how subset granularity or the decay factor β affects performance. 2. Limited baseline scope and fairness of comparison. The experiments compare OATS only to Jitter and TSMixup, neglecting other adapti
1 Addresses the significant problem of improving TSFM training via adaptive augmentation, moving beyond static heuristics. The concept of integrating online data attribution with conditional generative modeling is an interesting synthesis. 2 Motivates the approach by highlighting limitations of existing methods. The overall framework is presented clearly. Using influence functions offers a principled motivation. Experiments show positive results compared to static baselines.
1 TSIS calculation relying on approximations (first-order Taylor, SGD assumption) whose accuracy for large models with adaptive optimizers, like AdamW used here, is questionable and unverified. This undermines the reliability of the guiding signal. 2 The efficiency mechanism depends on the "locality of TSIS" heuristic, assuming similar influence within subsets. This may not hold generally, and performance could be sensitive to subset definition. Lack of sensitivity analysis for this and key hy
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis
