Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration

Btissame El Mahtout; Florian Ziel

arXiv:2605.10330·stat.ML·May 12, 2026

Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration

Btissame El Mahtout, Florian Ziel

PDF

TL;DR

This paper introduces an adaptive Mixture-of-Experts framework for time series forecasting that integrates expert-specific loss information and online learning to improve accuracy and efficiency.

Contribution

It presents a novel expert loss integration method combined with online updates, reducing training costs and enhancing predictive performance over existing models.

Findings

01

Outperforms statistical and neural models in accuracy and efficiency.

02

Effective expert loss integration improves forecasting performance.

03

Online learning strategy reduces computational costs.

Abstract

We propose a novel adaptive Mixture-of-Experts (MoE) framework for time series forecasting that enhances expert specialization by incorporating expert-specific loss information directly into the training process. Notably, the overall objective comprises the base forecasting loss and expert-specific losses, allowing expert-level prediction errors to jointly shape training alongside the global forecasting loss. This framework is further combined with a partial online learning strategy, enabling incremental updates of both the gating mechanism and expert parameters. This approach significantly reduces computational cost by eliminating the need for repeated full model retraining. By integrating expert-level loss awareness with efficient online optimization, the proposed method achieves improved learning efficiency while maintaining strong predictive performance. Empirical results across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.