Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains

Jiawen Zhang; Zhenwei Zhang; Shun Zheng; Xumeng Wen; Jia Li; Jiang Bian

arXiv:2505.19397·cs.LG·December 9, 2025

Are Time-Series Foundation Models Deployment-Ready? A Systematic Study of Adversarial Robustness Across Domains

Jiawen Zhang, Zhenwei Zhang, Shun Zheng, Xumeng Wen, Jia Li, Jiang Bian

PDF

Open Access 3 Reviews

TL;DR

This paper systematically evaluates the adversarial robustness of Time-Series Foundation Models, revealing their fragility under attacks and proposing adversarial fine-tuning as an effective mitigation, crucial for safe deployment.

Contribution

It introduces a tailored evaluation framework for TSFMs, uncovers specific vulnerability patterns, and demonstrates that adversarial fine-tuning significantly improves robustness.

Findings

01

Current TSFMs are highly brittle to small perturbations.

02

Vulnerabilities increase with longer context windows and are model-specific.

03

Adversarial fine-tuning enhances robustness effectively.

Abstract

Time-Series Foundation Models (TSFMs) are rapidly transitioning from research prototypes to core components of critical decision-making systems, driven by their impressive zero-shot forecasting capabilities. However, as their deployment surges, a critical blind spot remains: their fragility under adversarial attacks. This lack of scrutiny poses severe risks, particularly as TSFMs enter high-stakes environments vulnerable to manipulation. We present a systematic, diagnostic study arguing that for TSFMs, robustness is not merely a secondary metric but a prerequisite for trustworthy deployment comparable to accuracy. Our evaluation framework, explicitly tailored to the unique constraints of time series, incorporates normalized, sparsity-aware perturbation budgets and unified scale-invariant metrics across white-box and black-box settings. Across six representative TSFMs, we demonstrate…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. Clear, comprehensive threat modeling & eval setup: Covers white-box (PGD) and black-box (SimBA/ZOO), targeted and untargeted goals, with unified robustness metrics across six TSFMs and eight datasets. 2. Finds pervasive but model-specific vulnerabilities and quantifies factors that modulate attack success (context length, attack location, model size).

Weaknesses

1. Some robustness signals may reflect gradient obfuscation: MoE-style models appear PGD-resistant, but single-step and query-based attacks still work. 2. Technical contribution seems to be limited. I don't like to use this argument for paper review but vulnerability to adversarial attacks are well-known in the entire ML community.

Reviewer 02Rating 4Confidence 5

Strengths

The research topic is important and timely. The vulnerability and robustness of time series foundation models remain underexplored. The experimental design is comprehensive. Six representative models are evaluated across eight diverse datasets, providing convincing evidence to support the study’s findings.

Weaknesses

The primary weakness of this submission lies in its limited technical novelty. The manuscript mainly applies existing adversarial attack and defense techniques to zero-shot time series forecasting models, without introducing fundamentally new methodologies. Consequently, the proposed attacks can largely be mitigated by existing defense mechanisms. More specifically: 1. **Relation to Prior Work**: The submission does not clearly articulate its relationship or distinction from prior studies. For

Reviewer 03Rating 2Confidence 5

Strengths

1. The unified threat model (goal / capability / knowledge) and hybrid-norm constraint are technically well-defined and well-motivated for time-series data. 2. The work systematically examines model-specific failure modes, horizon sensitivity, and context-length effects. 3. Defense results are quantitatively compelling. In-domain LAT improves worst-case NMAE up to 10× under PGD and generalizes well out-of-domain. 4. Reproducibility statement and released code enhance reliability.

Weaknesses

**1. Over-claimed novelty and limited contribution boundary** The paper repeatedly claims to be **“the first large-scale, systematic robustness evaluation of TSFMs.”** However, two peer-reviewed works have already addressed adversarial robustness of TSFMs directly: **Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting - AISTATS 2025** Performs a systematic, cross-model and cross-dataset robustness analysis including TSFM such as TimeGPT, demonstrating that small, s

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Fault Detection and Control Systems