Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts
Roussel Desmond Nzoyem, Grant Stevens, Amarpal Sahota, David A.W. Barton, Tom Deakin

TL;DR
This paper introduces MixER, a novel sparse MoE layer with a custom gating algorithm, to improve hierarchical dynamical system reconstruction, demonstrating scalable training and insights into data structure effects.
Contribution
The paper proposes MixER, a new sparse MoE layer with a K-means based gating update, enhancing hierarchical DSR and addressing limitations of naive MoEs.
Findings
Efficient training and scalability to systems with up to ten ODEs.
Underperforms in high-data regimes with highly related data.
Hierarchical data structure influences representation quality.
Abstract
As foundational models reshape scientific discovery, a bottleneck persists in dynamical system reconstruction (DSR): the ability to learn across system hierarchies. Many meta-learning approaches have been applied successfully to single systems, but falter when confronted with sparse, loosely related datasets requiring multiple hierarchies to be learned. Mixture of Experts (MoE) offers a natural paradigm to address these challenges. Despite their potential, we demonstrate that naive MoEs are inadequate for the nuanced demands of hierarchical DSR, largely due to their gradient descent-based gating update mechanism which leads to slow updates and conflicted routing during training. To overcome this limitation, we introduce MixER: Mixture of Expert Reconstructors, a novel sparse top-1 MoE layer employing a custom gating update algorithm based on -means and least squares. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Gaussian Processes and Bayesian Inference · Time Series Analysis and Forecasting
MethodsMixture of Experts
