GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series Data

Cheng He; Xu Huang; Gangwei Jiang; Zhaoyi Li; Defu Lian; Hong Xie; Enhong Chen; Xijie Liang; Zengrong Zheng; Patrick P. C. Lee

arXiv:2502.03264·cs.LG·March 13, 2026

GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series Data

Cheng He, Xu Huang, Gangwei Jiang, Zhaoyi Li, Defu Lian, Hong Xie, Enhong Chen, Xijie Liang, Zengrong Zheng, Patrick P. C. Lee

PDF

Open Access 3 Reviews

TL;DR

GTM introduces a novel frequency-domain attention mechanism and a hybrid pre-training strategy to improve time-series representation learning, enabling versatile adaptation to multiple tasks and outperforming state-of-the-art models.

Contribution

The paper presents GTM, the first generative-task-agnostic time-series model with a frequency-aware attention mechanism and a unified pre-training approach for enhanced generalization.

Findings

01

GTM outperforms SOTA models on various generative tasks.

02

GTM achieves strong classification results with minimal adaptation.

03

GTM's accuracy improves with larger model size and more pre-training data.

Abstract

Despite recent progress in time-series foundation models, challenges persist in improving representation learning and adapting to diverse downstream tasks. We introduce a General Time-series Model (GTM), which advances representation learning via a novel frequency-domain attention mechanism that captures time-granularity-aware features, an aspect underexplored in prior research. We further propose a novel pre-training strategy that unifies reconstruction and autoregressive objectives through a hybrid masking mechanism. Our pre-training strategy, combined with 2D positional encoding and span shuffling, enhances the robustness and generalization of representations. GTM is established as the first generative-task-agnostic model for time-series analysis, enabling seamless adaptation to various generative tasks without any task-specific modifications. Extensive experiments demonstrate that…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. This paper addresses a critical gap in existing time series foundation models by introducing a Fourier attention mechanism that explicitly captures time-granularity-aware patterns from the frequency domain. 2. As far as I am concerned, GTM is the first TSFMs that supports seamless adaptation to all generative tasks (forecasting, imputation, anomaly detection) without task-specific modifications (e.g., tokenization adjustments, projection header changes). 3. The paper conducts rigorous experi

Weaknesses

1. The paper does not deeply explore the computational overhead of the Fourier attention mechanism—specifically, how FFT/iFFT operations (integrated into each decoder block) impact inference latency for real-time applications (e.g., streaming sensor monitoring). 2. The model analysis of this paper is insufficient. For example, no trade-off analysis (e.g., simplifying frequency modules to reduce latency) is provided, which is critical for practical deployment. 3. The TSFM baselines are outdated.

Reviewer 02Rating 4Confidence 3

Strengths

- The writing is clear and straightforward, laying out the reasoning for the model components and describing them in sufficient detail. The architecture seems to have a lot of backing from literature, and the authors build nicely on previous work. - It is encouraging to see that the combination of lots of disparate components of time series analysis results in a performative model. I think this is an important contribution to demonstrate, and the authors motivate the use of these features very w

Weaknesses

- One of the biggest weaknesses of the model is the resources required to run this architecture on practical time series data. A number of components of the model are very computationally expensive, including the full temporal attention and the internal FFT calls in the model. Some analysis of computational efficiency would be helpful. - The performance, while better than baselines across most tasks, is only marginally better than other models. In the field of time series foundation models, wher

Reviewer 03Rating 8Confidence 3

Strengths

1. The paper is exceptionally well-motivated. It begins with a clear, empirical analysis using FFT and 2D KDE to demonstrate that the joint probability distributions of amplitude-frequency and phase-frequency vary significantly across different temporal granularities . This finding provides a strong, intuitive justification for the core architectural novelty of the proposed model. 2. The paper introduces a novel Fourier attention mechanism designed to explicitly capture the granularity-aware fe

Weaknesses

1. The hybrid masking strategy is a core contribution, but a critical detail is underspecified. The paper mentions applying a "controlled proportion of consecutive [MASK] tokens to at the tail". The value or control mechanism for this proportion, which presumably balances the reconstruction and autoregressive objectives, is not defined in the main paper or the appendix's implementation details . 2. The proposed Fourier attention module adds a significant number of operations within each decoder

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Neural Networks and Applications

MethodsSoftmax · Attention Is All You Need · Matching The Statements