MSMixer: Learned Multi-Scale Temporal Mixing with Complementary Linear Shortcut for Long-Term Time Series Forecasting
Ahmed Cherif

TL;DR
MSMixer is a lightweight, multi-scale MLP model for long-term time series forecasting that effectively captures patterns at multiple temporal resolutions, outperforming several baselines.
Contribution
The paper introduces MSMixer, a novel multi-scale MLP architecture with dynamic weighting and a trend-seasonality shortcut, enabling efficient long-term forecasting.
Findings
MSMixer achieves the lowest average MSE among lightweight models on ETT benchmarks.
MSMixer outperforms DLinear and NLinear in 12 of 16 configurations.
MSMixer uses 5x fewer parameters than Transformer-based baselines while maintaining high accuracy.
Abstract
Long-term time series forecasting requires models that simultaneously capture rapid oscillations, medium-range periodicities, and slowly evolving macro-trends from a fixed look-back window. Existing lightweight MLP-based models typically operate on a single temporal resolution, limiting their ability to explicitly model patterns at multiple scales. We propose MSMixer, a channel-independent multi-scale MLP architecture that addresses this limitation through three complementary innovations: (i) three parallel scale branches at down-sample factors {1x, 4x, 16x} with independent MLP blocks, (ii) a learnable softmax gate that dynamically weighs branch outputs, and (iii) a DLinear complementary shortcut that provides full-window trend and seasonality context. MSMixer contains only 112K parameters at H=96 and runs at O(T) complexity. Evaluated on four ETT benchmarks with standard chronological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
