TL;DR
Dynamic TMoE introduces a drift-aware, adaptive mixture of experts framework for non-stationary time series forecasting, effectively handling distribution shifts and regime changes.
Contribution
It unifies architectural evolution with temporal continuity, dynamically instantiates and prunes experts based on detected shifts, and employs a memory router for stable expert selection.
Findings
Achieves state-of-the-art performance on nine benchmarks.
Reduces MSE by 10.4% and MAE by 7.8%.
Effectively adapts to abrupt regime shifts.
Abstract
Non-stationary time series forecasting is challenged by evolving distribution shifts that static models struggle to capture. While Mixture-of-Experts (MoE) architectures offer a promising paradigm for decoupling complex drift patterns, existing approaches are limited by fixed expert pools and memoryless routing, hampering their ability to adapt to abrupt regime shifts. To address this, we propose Dynamic TMoE, a framework that unifies architectural evolution with temporal continuity during learning phase. By detecting distribution shifts via Maximum Mean Discrepancy (MMD), we dynamically instantiate heterogeneous experts and prune redundant ones to optimize capacity. Additionally, a temporal memory router leverages recurrent states and an anomaly repository to ensure stable, context-aware expert selection without requiring test-time updates. Experiments on nine benchmarks demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
