Strategies for Pretraining Neural Operators
Anthony Zhou, Cooper Lorsung, AmirPouya Hemmasian, Amir Barati, Farimani

TL;DR
This paper compares various pretraining strategies for neural operators in PDE modeling, highlighting their dependence on model and dataset choices, and demonstrating benefits in transfer learning, data augmentation, and scarce data regimes.
Contribution
It provides a systematic comparison of pretraining methods for neural operators, without architecture optimization, to understand their effects on generalization and scaling.
Findings
Pretraining effectiveness varies with model and dataset.
Transfer learning and physics-based pretraining perform best.
Data augmentation enhances pretraining performance.
Abstract
Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different pretraining frameworks. To address this, we compare various pretraining methods without optimizing architecture choices to characterize pretraining dynamics on different models and datasets as well as to understand its scaling and generalization behavior. We find that pretraining is highly dependent on model and dataset choices, but in general transfer learning or physics-based pretraining strategies work best. In addition, pretraining performance can be further improved by using data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
