Towards Universal Neural Operators through Multiphysics Pretraining

Mikhail Masliaev; Dmitry Gusarov; Ilya Markov; Alexander Hvatov

arXiv:2511.10829·cs.LG·November 17, 2025

Towards Universal Neural Operators through Multiphysics Pretraining

Mikhail Masliaev, Dmitry Gusarov, Ilya Markov, Alexander Hvatov

PDF

Open Access 3 Reviews

TL;DR

This paper explores the use of transformer-based neural operators pretrained on simple problems to efficiently solve complex PDEs, demonstrating effective transfer learning across diverse physical simulation tasks.

Contribution

It introduces a general transfer learning framework for transformer-based neural operators applied to multiple PDE problems, extending their applicability beyond specific cases.

Findings

01

Transformers effectively transfer knowledge across diverse PDE tasks.

02

Pretraining reduces computational costs for complex physical simulations.

03

Neural operators generalize well to unseen parameters and new variables.

Abstract

Although neural operators are widely used in data-driven physical simulations, their training remains computationally expensive. Recent advances address this issue via downstream learning, where a model pretrained on simpler problems is fine-tuned on more complex ones. In this research, we investigate transformer-based neural operators, which have previously been applied only to specific problems, in a more general transfer learning setting. We evaluate their performance across diverse PDE problems, including extrapolation to unseen parameters, incorporation of new variables, and transfer from multi-equation datasets. Our results demonstrate that advanced neural operator architectures can effectively transfer knowledge across PDE problems.

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 3

Strengths

- Tackles the difficult problem of training foundation models applicable across a range of different PDEs and parameters. - Introduces two novel architectures allowing pre-training and efficient fine-tuning. - Studies a diverse benchmark of PDEs and groups of PDEs, showcasing the robustness of the models.

Weaknesses

- No ablation studies to showcase the importance of the adapters, or other layers such as the self-attention - Generally speaking, it does not seem that the gains in terms of accuracy and training time are significant, vis-à-vis other neural operator architectures allowing pre-training and fine-tuning, but even in comparison with FNO.

Reviewer 02Rating 4Confidence 4

Strengths

- The paper studies foundation model for PDE, which is a very interesting and important problem. - The paper systematically studied generalization across to new coefficients, forcing terms, and PDEs. - Experiment shows pretraining is generally helpful. - The new architectures seems to have better accuracy on extended equations and on new PDEs.

Weaknesses

- Similar Pretraining stage has been studied in Poseidon and [1]. - One of the key question is not yet answered: is pretraining always helpful? Should we train on all the data (across PDEs, forcing terms, and coefficients)? or should we limit pretraining to a certain subset? - The new architectures of combining Mamba and Perceiver are not very significant. It is unclear if these modifications are helpful. [1] McCabe, Michael, et al. "Multiple physics pretraining for spatiotemporal surrogate mod

Reviewer 03Rating 4Confidence 3

Strengths

(i) Clear recipe: freeze backbone and train adapters. The pretrain to finetune scheme is explicit: fix the common integral operator stack and update only small input/output adapters, which highlights what transfers and cuts training cost. (ii) Well-designed evaluation: the authors test three distinct scenarios, new parameters, added physics/inputs, and cross PDE transfer, without changing the modeling recipe, (iii) Consistent empirical gains: across tables, pretraining plus adapters improves ac

Weaknesses

(i) Scope limited to same dimensionality and curated grids: the method is explicitly evaluated when pretrain and fine tune tasks have the same problem dimensionality, and the pipeline resamples all data to a shared fixed grid. That leaves open transfer across 2D to 3D, irregular meshes, or complex geometries/boundaries, (ii) Speedups reported per epoch, not end to end: tables report Avg. epoch time(s) but not the total wall clock including pretraining. This makes it hard to judge the true effic

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science