PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
Jiaming Ma, Qihe Huang, Haofeng Ma, Guanjun Wang, Sheng Huang, Zhengyang Zhou, Pengkun Wang, Binwu Wang, Yang Wang

TL;DR
PHAT introduces a novel transformer-based approach that models periodic heterogeneity in multivariate time series, effectively capturing variable-specific periodic patterns to improve forecasting accuracy.
Contribution
The paper proposes PHAT, a new model that arranges data into a periodic bucket tensor and employs a positive-negative attention mechanism to better capture diverse periodicities.
Findings
PHAT outperforms 18 baselines on 14 datasets.
It effectively captures variable-specific periodic patterns.
The model achieves state-of-the-art forecasting accuracy.
Abstract
While existing multivariate time series forecasting models have advanced significantly in modeling periodicity, they largely neglect the periodic heterogeneity common in real-world data, where variables exhibit distinct and dynamically changing periods. To effectively capture this periodic heterogeneity, we propose PHAT (Period Heterogeneity-Aware Transformer). Specifically, PHAT arranges multivariate inputs into a three-dimensional "periodic bucket" tensor, where the dimensions correspond to variable group characteristics with similar periodicity, time steps aligned by phase, and offsets within the period. By restricting interactions within buckets and masking cross-bucket connections, PHAT effectively avoids interference from inconsistent periods. We also propose a positive-negative attention mechanism, which captures periodic dependencies from two perspectives: periodic alignment and…
Peer Reviews
Decision·ICLR 2026 Poster
- The proposed PHAT design is principled and intuitive, integrating adaptive temporal decomposition into the Transformer framework without major architectural overhead. - The method is generalizable and can be incorporated into existing video backbones. - Experimental results are strong and consistent across multiple datasets, showing both improved accuracy and efficiency.
- The novelty is moderate, as the idea of handling multi-frequency or periodic dynamics has appeared in previous works on temporal Fourier attention and spectral modeling. - The mathematical formulation of heterogeneity modeling could be more rigorous; the “adaptive period tokens” are primarily empirical and not theoretically justified. - The comparisons focus mainly on uniform-period baselines but omit stronger contemporaneous temporal adaptation models. - The ablation studies are limited
1. Simple and efficient method, easy to understand. 2. Clear motivation and well-organized structure. 3. Extensive experiments with diverse datasets and baselines, offering strong empirical support.
1. The experimental validation regarding “attention ignoring negative correlations” is not convincing; raw data analysis alone is insufficient to justify modeling implications at the feature level. 2. The paper should include an ablation that isolates the Frequency-based Multi-period Prediction component to clarify the exact gain contributed by the core modules, especially since prediction head size can directly affect performance in many settings. 3. Table 4 only reports FLOPs, but **actual inf
This paper proposes PHAT (Period Heterogeneity-Aware Transformer) for multivariate time-series forecasting. The key idea is to (i) detect per-variable dominant periods via FFT, (ii) group variables into periodic buckets that share a period, fold sequences into a 3-D tensor (bucket × period-offset × period-aligned), and (iii) apply a Positive-Negative Attention (PNA) with X-shaped receptive field that models phase-aligned vs. within-period relations and explicitly decomposes positive/negative cor
1. the main protocol tunes the look-back T per model and reports the best; a fixed-T comparison is deferred to the appendix. While both settings are shown, the primary table mixing tuned-T results across diverse baselines can blur fairness. Please foreground the fixed-T tables in the main paper (or add both side-by-side) and state the exact search ranges for T and other critical hparams per baseline. 2. Sensitivity to period detection & K. Periods are extracted by FFT Top-K peaks and rounded to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Forecasting Techniques and Applications · Machine Learning in Healthcare
