GateTS: Versatile and Efficient Forecasting via Attention-Inspired routed Mixture-of-Experts

Kyrylo Yemets; Mykola Lukashchuk; Ivan Izonin

arXiv:2508.17515·cs.LG·August 26, 2025

GateTS: Versatile and Efficient Forecasting via Attention-Inspired routed Mixture-of-Experts

Kyrylo Yemets, Mykola Lukashchuk, Ivan Izonin

PDF

TL;DR

GateTS introduces a simplified, efficient MoE-based forecasting model with an attention-inspired gating mechanism that improves accuracy and resource utilization across diverse time-series datasets.

Contribution

The paper presents a novel gating mechanism for MoE models that simplifies training and enhances forecasting accuracy without auxiliary losses.

Findings

01

Achieves superior accuracy compared to classical MoE and transformer models.

02

Promotes balanced expert utilization naturally through the gating mechanism.

03

More computationally efficient than LSTM for various forecasting horizons.

Abstract

Accurate univariate forecasting remains a pressing need in real-world systems, such as energy markets, hydrology, retail demand, and IoT monitoring, where signals are often intermittent and horizons span both short- and long-term. While transformers and Mixture-of-Experts (MoE) architectures are increasingly favored for time-series forecasting, a key gap persists: MoE models typically require complicated training with both the main forecasting loss and auxiliary load-balancing losses, along with careful routing/temperature tuning, which hinders practical adoption. In this paper, we propose a model architecture that simplifies the training process for univariate time series forecasting and effectively addresses both long- and short-term horizons, including intermittent patterns. Our approach combines sparse MoE computation with a novel attention-inspired gating mechanism that replaces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.