Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

Ellwil Sharma; Arastu Sharma

arXiv:2605.15179·cs.LG·May 15, 2026

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

Ellwil Sharma, Arastu Sharma

PDF

TL;DR

This paper introduces Shodh-MoE, a sparse mixture-of-experts neural architecture that effectively separates and learns multiple physics regimes in scientific modeling, reducing negative transfer and improving accuracy.

Contribution

The paper proposes a novel sparse-activated transformer with dynamic routing for multi-physics transport, enabling specialized learning and mitigating interference between different PDE regimes.

Findings

01

Model achieves exact mass conservation with divergence ~2.8 x 10^-10.

02

Routing telemetry shows domain-specific expert specialization.

03

Latent and physical MSEs demonstrate high accuracy across regimes.

Abstract

Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operators. In particular, broadband open-channel fluid dynamics and boundary-dominated porous media flows impose incompatible spectral and geometric demands on a single dense parameter path. We introduce Shodh-MoE, a sparse-activated latent transformer architecture for multi-physics transport. Shodh-MoE operates on compressed 16^3 physical latents produced by a physics-informed autoencoder with an intra-tokenizer Helmholtz-style velocity parameterization, restricting decoded states to divergence-free velocity manifolds. The model guarantees exact mass conservation, achieving a physically verifiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.