Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
Btissame El Mahtout, Florian Ziel

TL;DR
This paper introduces an adaptive Mixture-of-Experts framework for time series forecasting that integrates expert-specific loss information and online learning to improve accuracy and efficiency.
Contribution
It presents a novel expert loss integration method combined with online updates, reducing training costs and enhancing predictive performance over existing models.
Findings
Outperforms statistical and neural models in accuracy and efficiency.
Effective expert loss integration improves forecasting performance.
Online learning strategy reduces computational costs.
Abstract
We propose a novel adaptive Mixture-of-Experts (MoE) framework for time series forecasting that enhances expert specialization by incorporating expert-specific loss information directly into the training process. Notably, the overall objective comprises the base forecasting loss and expert-specific losses, allowing expert-level prediction errors to jointly shape training alongside the global forecasting loss. This framework is further combined with a partial online learning strategy, enabling incremental updates of both the gating mechanism and expert parameters. This approach significantly reduces computational cost by eliminating the need for repeated full model retraining. By integrating expert-level loss awareness with efficient online optimization, the proposed method achieves improved learning efficiency while maintaining strong predictive performance. Empirical results across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
