Seg-MoE: Multi-Resolution Segment-wise Mixture-of-Experts for Time Series Forecasting Transformers
Evandro S. Ortigossa, Eran Segal

TL;DR
Seg-MoE introduces segment-wise routing in Transformer-based time series forecasting, improving scalability and accuracy by exploiting temporal locality, outperforming previous token-wise MoE and dense models across multiple benchmarks.
Contribution
This work proposes a novel segment-wise MoE routing mechanism for time series Transformers, aligning model architecture with temporal data structure for enhanced performance.
Findings
Achieves state-of-the-art forecasting accuracy on multiple benchmarks.
Segment-level routing outperforms token-wise routing in experiments.
Abalation confirms the importance of segment-level routing for improvements.
Abstract
Transformer-based models have recently made significant advances in accurate time-series forecasting, but even these architectures struggle to scale efficiently while capturing long-term temporal dynamics. Mixture-of-Experts (MoE) layers are a proven solution to scaling problems in natural language processing. However, existing MoE approaches for time-series forecasting rely on token-wise routing mechanisms, which may fail to exploit the natural locality and continuity of temporal data. In this work, we introduce Seg-MoE, a sparse MoE design that routes and processes contiguous time-step segments rather than making independent expert decisions. Token segments allow each expert to model intra-segment interactions directly, naturally aligning with inherent temporal patterns. We integrate Seg-MoE layers into a time-series Transformer and evaluate it on multiple multivariate long-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Forecasting Techniques and Applications · Traffic Prediction and Management Techniques
