Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing
Ellwil Sharma, Arastu Sharma

TL;DR
This paper introduces Shodh-MoE, a sparse mixture-of-experts neural architecture that effectively separates and learns multiple physics regimes in scientific modeling, reducing negative transfer and improving accuracy.
Contribution
The paper proposes a novel sparse-activated transformer with dynamic routing for multi-physics transport, enabling specialized learning and mitigating interference between different PDE regimes.
Findings
Model achieves exact mass conservation with divergence ~2.8 x 10^-10.
Routing telemetry shows domain-specific expert specialization.
Latent and physical MSEs demonstrate high accuracy across regimes.
Abstract
Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operators. In particular, broadband open-channel fluid dynamics and boundary-dominated porous media flows impose incompatible spectral and geometric demands on a single dense parameter path. We introduce Shodh-MoE, a sparse-activated latent transformer architecture for multi-physics transport. Shodh-MoE operates on compressed 16^3 physical latents produced by a physics-informed autoencoder with an intra-tokenizer Helmholtz-style velocity parameterization, restricting decoded states to divergence-free velocity manifolds. The model guarantees exact mass conservation, achieving a physically verifiable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
