ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies

Itay Katav; Aryeh Kontorovich

arXiv:2507.13998·cs.LG·September 26, 2025

ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies

Itay Katav, Aryeh Kontorovich

PDF

Open Access 3 Reviews

TL;DR

ParallelTime introduces a dynamic weighting mechanism for short- and long-term dependencies in time series forecasting, significantly improving performance over existing methods by adaptively balancing these dependencies.

Contribution

It proposes the ParallelTime architecture with a novel dynamic weighting mechanism, enabling better dependency modeling and state-of-the-art results in time series forecasting.

Findings

01

Achieves state-of-the-art performance across benchmarks.

02

Uses fewer parameters and lower FLOPs.

03

Scales effectively to longer prediction horizons.

Abstract

Modern multivariate time series forecasting primarily relies on two architectures: the Transformer with attention mechanism and Mamba. In natural language processing, an approach has been used that combines local window attention for capturing short-term dependencies and Mamba for capturing long-term dependencies, with their outputs averaged to assign equal weight to both. We find that for time-series forecasting tasks, assigning equal weight to long-term and short-term dependencies is not optimal. To mitigate this, we propose a dynamic weighting mechanism, ParallelTime Weighter, which calculates interdependent weights for long-term and short-term dependencies for each token based on the input and the model's knowledge. Furthermore, we introduce the ParallelTime architecture, which incorporates the ParallelTime Weighter mechanism to deliver state-of-the-art performance across diverse…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper as a whole is content-rich, with strong logical connections between the parts. The entire work forms a closed-loop logic of problem, solution, verification, and explanation. The explanation section, in particular, discusses the underlying logic of the patterns and method performance, which enhances the interpretability of the approach. 2. The paper demonstrates good originality, both in the global design of the ParallelTime framework and in specific details (for example, Section 3.

Weaknesses

1. In Section 3.3, when introducing global registers, the paper defines the content stored as global shared information, but does not indicate the rule or type of this information selection, such as whether it is statistical features like mean values or periodic values like peaks, making the register a black-box component. 2. The paper only verifies the efficiency of the method from the perspective of experimental FLOPs, lacking a theoretical explanation of the method’s superiority. For example

Reviewer 02Rating 4Confidence 3

Strengths

1. This paper is the first to combine the mamba and transformer architectures for time-series, expecting short- and long-term dependencies gains from their architectures. 1. A novel Parallel Weighter is proposed to combine the outputs from the mamba blocks and the local attention blocks, and experiment results show that it benefits the performance by simply summing or averaging (with ablation studies for justification). 1. Detailed experiments are conducted to show that the Parallel Time model a

Weaknesses

This paper is solid and well-written, except for some minor weaknesses: 1. Missing definition of $\mathbf{y}_t$ in line 222, and missing relationship between $\text{Mamba}(\cdot)$ and $\mathbf{x}, \mathbf{y}, \mathbf{h}$ in line 227. 1. Missing comparison of pure mamba and pure window-attention in Figure 6. 1. Efficiency comparison only compares with PatchTST, missing some potentially more lightweight models like DLinear. 1. ParallelTime was designed to capture more long-time dependencies (or gl

Reviewer 03Rating 4Confidence 5

Strengths

This paper uses local attention and Mamba to extract information at different time intervals and performs adaptive weighted ensemble.

Weaknesses

1、The method is essentially a multi-scale framework that divides the time series into long-term and short-term paths for separate feature extraction, without introducing any new modeling mechanism. 2、The fusion of long- and short-term features relies on simple weighting or concatenation, lacking an adaptive interaction mechanism. 3、Moreover, it merely employs Mamba and Attention to extract their respective effective features, but contributes no architectural or algorithmic innovation. 4、There

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics · Complex Network Analysis Techniques · Data Visualization and Analytics