TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting
Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma,, James Y. Zhang, Jun Zhou

TL;DR
TimeMixer introduces a multiscale mixing approach for time series forecasting, effectively disentangling complex temporal variations by leveraging different sampling scales, and achieves state-of-the-art results with efficient architecture.
Contribution
The paper proposes a novel multiscale-mixing framework with PDM and FMM blocks, enhancing time series decomposition and prediction beyond existing methods.
Findings
Achieves state-of-the-art forecasting accuracy on multiple benchmarks.
Demonstrates effectiveness in both long-term and short-term forecasting.
Maintains favorable run-time efficiency compared to existing models.
Abstract
Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, which is based on an intuitive but important observation that time series present distinct patterns in different sampling scales. The microscopic and the macroscopic information are reflected in fine and coarse scales respectively, and thereby complex variations can be inherently disentangled. Based on this observation, we propose TimeMixer as a fully MLP-based architecture with Past-Decomposable-Mixing (PDM) and Future-Multipredictor-Mixing (FMM) blocks to take full advantage of disentangled…
Peer Reviews
Decision·ICLR 2024 poster
S1. The paper has a clear and easily understandable structure. S2. The experiments are extensive, involving long time series forecasting without the use of highly noisy exchange-rate and illness datasets. Additionally, a new solar-energy dataset is introduced, and the provided code and configurations enhance the credibility of the experimental results.
W1. In general, upsampling results in more data points, while downsampling results in fewer data points (as illustrated in Figure 1, leftmost). Based on this, I believe the descriptions 'Up-mixing' and 'Down-Mixing' in Figure 2 by the authors may not be appropriate and should perhaps be reversed. W2. The paper lacks significant innovation. (1) Decoupling of multiscale [1], seasonal-trend disentanglement [2] are common modules that have already been proposed. (2) The so-called FMM module appear
1. **Solid Motivation**: - The motivation behind the paper is robust and well-justified. - The significance of **TIME SERIES FORECASTING** is inherently evident and requires no further validation. - Echoing the authors' sentiments, the multiscale analysis paradigm stands out as a classic yet crucial methodology to model the intricate temporal variations inherent in time series data. 2. **Coherent Conceptual Framework**: - The core idea of decomposing the signal into various s
1. **Analysis and Explanation of the Results**: - While the empirical experiments, inclusive of the ablation study, provide evidence of the effectiveness of TimeMixer, the underlying reasons for its superior performance remain somewhat opaque. Specifically, when juxtaposed against competing models, it's not lucidly expounded how TimeMixer excels in capturing temporal variations. A deeper dive into this comparative analysis would have been enlightening. Introducing spectral analysis or simila
**1. Easy to understand** In simpler terms, the paper is written well. It explains its ideas clearly and in a way that anyone can understand. This makes it easier for readers to grasp the concepts and improves the paper's overall clarity. **2. Good performance with simple models** Empirically, TimeMixer exhibits remarkably low computational and memory costs because it consists of only simple linear models. However, it's important not to overlook its performance, which should not be underest
In spite of the good aspects of this paper, I hesitate to give acceptance because of some minor but important concerns. **1. Absence of some important baselines** Because this paper is based on decomposition, I think the authors have to include methods based on decompositions [1] into baselines. Furthermore, (although they are not accepted to any conference, ) it is beneficial to include methods based on MLP-Mixer [2,3]. I think that this absence makes the second and fourth strengths fade. **2
Code & Models
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Complex Systems and Time Series Analysis · Neural Networks and Applications
