CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, and Ievgen Redko

TL;DR
CauKer introduces a novel method combining Gaussian Processes and Structural Causal Models to generate synthetic time series data, enabling more sample-efficient pretraining of classification models with consistent scaling laws.
Contribution
The paper presents CauKer, a new algorithm for generating diverse, causally coherent synthetic time series data for pretraining TSFMs, reducing reliance on large real-world datasets.
Findings
CauKer-generated datasets follow clear scaling laws for size and model capacity.
Synthetic data enables effective pretraining across different TSFM architectures.
Real-world datasets show irregular scaling behavior, unlike CauKer data.
Abstract
Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit…
Peer Reviews
Decision·ICLR 2026 Oral
* Addresses a clear gap, synthetic pretraining for classification TSFMs. * The causal kernel composition is conceptually elegant and well motivated. * Benchmarks across multiple models and datasets . * Includes scaling law analyses for data, model, and compute. * Outperforms real-data pretraining in several zero-shot setups. * The method is explained clearly, with schematic diagrams and pseudocode.
* Both GP based and SCM based data generation already exist, the novelty lies mostly in combining them. * Evaluation confined to zero-shot classification. would benefit from downstream fine-tuning or transfer learning results. * The contribution of causal graph depth/branching remains unclear. * While interesting, the scaling analysis is somewhat descriptive without deeper theoretical grounding
1. Integrating kernel composition with SCM-based propagation yields diverse dynamics and inter-series dependencies aligned with classification objectives. 2. Evaluation across contrastive and masked-reconstruction pretraining objectives increases the generality and external validity of the findings. 3. Experiments demonstrate data/model scaling laws and strong zero-shot transfer, offering a compelling empirical performance.
1. Pretraining on pure synthetic data and obtaining strong results is not particularly surprising, as prior work (e.g., TabPFN-TS) has already demonstrated the potential of synthetic data. This manuscript would benefit from sharper positioning of what is substantively novel in methodology part. 2. This paper does not clearly articulate the challenges in transferring synthetic data generation methods designed for forecasting tasks to classification tasks—what the specific difficulties are and ho
* **Novelty and Formulation:** The primary strength of this work is its well-motivated. Rather than creating a monolithic generator, the authors identify two key requirements for classification data—realistic temporal dynamics and discriminative clustering structure—and solve them by combining the strengths of two distinct fields. Using GP kernel composition (common in forecasting) for temporal patterns and SCMs (from the causality and tabular learning literature) for creating underlying class
I have concerns about the evaluation process and specially related to the complexity of the proposed generator, the framing of its comparison to real-world data, and the scope of the architectural evaluation. 1. **High Generator Complexity and Opaque Design Choices:** The CAUKER pipeline is a complex amalgamation of multiple components: three distinct function banks (kernel, mean, activation), random kernel composition, and random DAG generation. This introduces a large number of "meta-hyperpa
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Gaussian Processes and Bayesian Inference · Machine Learning in Healthcare
