PRISM: A hierarchical multiscale approach for time series forecasting
Zihao Chen, Alexandre Andre, Wenrui Ma, Ian Knight, Sergey Shuvaev, Eva Dyer

TL;DR
PRISM introduces a hierarchical, tree-based method for time series forecasting that captures both global trends and local features across multiple scales, improving accuracy over existing methods.
Contribution
The paper proposes a novel hierarchical approach using learnable tree partitioning and scale-specific features for improved multiscale time series forecasting.
Findings
Outperforms state-of-the-art forecasting methods on benchmark datasets
Effectively captures global and local signal structures
Provides a lightweight, flexible forecasting framework
Abstract
Forecasting is critical in areas such as finance, biology, and healthcare. Despite the progress in the field, making accurate forecasts remains challenging because real-world time series contain both global trends, local fine-grained structure, and features on multiple scales in between. Here, we present a new forecasting method, PRISM (Partitioned Representation for Iterative Sequence Modeling), that addresses this challenge through a learnable tree-based partitioning of the signal. At the root of the tree, a global representation captures coarse trends in the signal, while recursive splits reveal increasingly localized views of the signal. At each level of the tree, data are projected onto a time-frequency basis (e.g., wavelets or exponential moving averages) to extract scale-specific features, which are then aggregated across the hierarchy. This design allows the model to jointly…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper presents an interesting technique to enhance hierarchical temporal architectures with frequency filtering. Frequency filtering seems to be an essential component of time series forecasting helping identify cyclical patterns in the dataset. - Experimental results show that the method performs strongly compared to compared to most baselines, and performs closely to D-PAD which is a more complicated architecture. - The paper is well presented and the ideas are quite clear. The experimen
- The main weakness of the paper is that the method performs closely to D-PAD and isn't a significant improvement compared to D-PAD, however D-PAD is a much involved architecture compared to the proposed method. - While, the paper evaluates on the most popular univariate datasets, some more complex datasets could be a newer addition such as M4 and wikipedia. While they are multi-variate datasets, they could help prevent overfitting of techniques on the existing datasets.
* Clear problem framing and gap statement. The paper argues that prior work typically builds hierarchy in only time or only frequency, or mixes domains without a reconstructable shared hierarchy. * Coherent architecture. Overlapped binary splits (time) + band partition (frequency) + learnable band routing + an explicit reconstruction path form a consistent design. * Broad empirical coverage and ablations. Results across 32 settings, with component-wise ablations showing 5–14% average performance
* (Primary) Limited conceptual novelty relative to recent multiscale “decompose–mix” lines. The high-level philosophy—multiscale decomposition and mixing—strongly overlaps with recent TimeMixer-style approaches. The paper does cite such work in Related Work (e.g., Ref. [20]), but the manuscript does not clearly establish a qualitative leap beyond “engineered combination” of known ideas (time hierarchy + frequency filters + learned weighting + auxiliary reconstruction). Claimed distinction is a r
1. The learnable importance scores provide insights into which frequency components drive predictions, adding transparency to the model’s behavior. 2. The paper is well-structured and easy to follow.
1. The binary partitioning strategy is manually defined rather than data-adaptive; this might limit flexibility for non-stationary or irregularly sampled signals. 2. **Dependence on pre-defined transforms**. The method relies on fixed wavelet or FFT bases. Learned or adaptive frequency decompositions could potentially capture more expressive features. 3. The benchmarks are still limited to standard ones. More various or big datasets should be put in place to demonstrate the effectiveness of the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Time Series Analysis and Forecasting · Stock Market Forecasting Methods
