GateTS: Versatile and Efficient Forecasting via Attention-Inspired routed Mixture-of-Experts
Kyrylo Yemets, Mykola Lukashchuk, Ivan Izonin

TL;DR
GateTS introduces a simplified, efficient MoE-based forecasting model with an attention-inspired gating mechanism that improves accuracy and resource utilization across diverse time-series datasets.
Contribution
The paper presents a novel gating mechanism for MoE models that simplifies training and enhances forecasting accuracy without auxiliary losses.
Findings
Achieves superior accuracy compared to classical MoE and transformer models.
Promotes balanced expert utilization naturally through the gating mechanism.
More computationally efficient than LSTM for various forecasting horizons.
Abstract
Accurate univariate forecasting remains a pressing need in real-world systems, such as energy markets, hydrology, retail demand, and IoT monitoring, where signals are often intermittent and horizons span both short- and long-term. While transformers and Mixture-of-Experts (MoE) architectures are increasingly favored for time-series forecasting, a key gap persists: MoE models typically require complicated training with both the main forecasting loss and auxiliary load-balancing losses, along with careful routing/temperature tuning, which hinders practical adoption. In this paper, we propose a model architecture that simplifies the training process for univariate time series forecasting and effectively addresses both long- and short-term horizons, including intermittent patterns. Our approach combines sparse MoE computation with a novel attention-inspired gating mechanism that replaces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
