FeDaL: Federated Dataset Learning for General Time Series Foundation Models

Shengchao Chen; Guodong Long; Michael Blumenstein; Jing Jiang

arXiv:2508.04045·cs.LG·March 17, 2026

FeDaL: Federated Dataset Learning for General Time Series Foundation Models

Shengchao Chen, Guodong Long, Michael Blumenstein, Jing Jiang

PDF

3 Reviews

TL;DR

FeDaL introduces a federated learning framework for time series models that effectively handles dataset heterogeneity by learning shared representations and eliminating biases, improving generalization across diverse real-world tasks.

Contribution

The paper proposes FeDaL, a novel federated dataset learning method that explicitly mitigates biases in heterogeneous time series data for better generalization of foundation models.

Findings

01

FeDaL outperforms 54 baselines across eight real-world tasks.

02

Federated scaling analysis shows data volume and client number influence performance.

03

FeDaL effectively reduces domain biases in diverse time series datasets.

Abstract

Dataset-level heterogeneity introduces significant domain biases that fundamentally degrade generalization on general Time Series Foundation Models (TSFMs), yet this challenge remains underexplored. This paper rethinks the from-scratch training of TSFMs using the paradigm of federated learning. We propose a novel Federated Dataset Learning (FeDaL) approach to tackle heterogeneous time series by learning dataset-agnostic temporal representations. Specifically, the distributed architecture of federated learning is a nature solution to decompose heterogeneous TS datasets into shared generalized knowledge and preserved personalized knowledge. Moreover, based on the TSFM architecture, FeDaL explicitly mitigates both local and global biases by adding two complementary mechanisms: Domain Bias Elimination (DBE) and Global Bias Elimination (GBE). FeDaL`s cross-dataset generalization has been…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

The paper is well-written and presents results of a comprehensive experimental study. The problem that it addresses is timely and had previously not been explored enough. The proposed two-level design (DBE for local, GBE for global bias) appears modular and easy to integrate.

Weaknesses

While the paper reports a strong engineering effort, it unfortunately lacks a clear algorithmic or theoretical innovation. The framework combines concepts from a number of prior works, including (Nie et al., 2022, Wu et al., 2021, Zhang et al., 2015, Acar et al., 2021, Killamsetty et al., 2021), to establish DBE and GBE but these are ultimately heuristic -- there is no theoretical analysis of "bias elimination", or of the convergence of the proposed federated scheme. Additionally, it is not clea

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper addresses a well-motivated problem in federated time series modeling, focusing on dataset-level bias and cross-domain generalization. 2. The proposed FeDaL framework is modular and clearly structured, integrating client-side and server-side bias elimination mechanisms (DBE and GBE). 3. The gradient-level correction mechanism via a server-side state vector is a meaningful extension of FedAvg, with clear mathematical formulation. 4. The scaling analysis in Section 4.3 provides valuabl

Weaknesses

1. Unclear Structure in Problem Statement: The introduction claims that existing works face “two major challenges,” but only one (coarse-grained treatment of heterogeneity) is explicitly developed. The second challenge is not clearly introduced or elaborated, which weakens the framing of the paper’s motivation. 2. The Global Bias Elimination (GBE) module introduces a cumulative server-side state vector sr, which may be prone to instability over long training rounds if not properly scaled. While

Reviewer 03Rating 2Confidence 4

Strengths

- The paper shows many experimental results and ablation studies to show how it performs on multiple aspects. The breadth of baselines covered is impressive. - Performance results are good. - The research problem is very relevant for the community

Weaknesses

- The novelty of the method is weak. The method simply combines existing works and plugs them into the framework. For e.g. DBE applies decomposition (Wu et al. 2021) and EMA from (Zhang et al. 2015). GBE uses drift correction from (Acar et al. 2021) and core set tuning from (Killamsetty et al 2021). - Many of the federated baselines in Table 1 are quite old, except for FFTS. Newer baselines like Time-FFM (Liu et al. 2024) are missing. - Table 1 also does not compare against strong centralized

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.