Multi-Operator Few-Shot Learning for Generalization Across PDE Families
Yile Li, Shandian Zhe

TL;DR
This paper introduces MOFS, a multimodal framework enabling neural operators to generalize to unseen PDEs with minimal data, combining self-supervised pretraining, text-conditioned embeddings, and multimodal prompting.
Contribution
The work presents a novel multi-operator few-shot learning approach that integrates multimodal data and contrastive fine-tuning for PDE operator generalization.
Findings
Outperforms existing methods in few-shot PDE operator learning
Effective use of multimodal prompts improves generalization
Validated on Darcy Flow and Navier Stokes benchmarks
Abstract
Learning solution operators for partial differential equations (PDEs) has become a foundational task in scientific machine learning. However, existing neural operator methods require abundant training data for each specific PDE and lack the ability to generalize across PDE families. In this work, we propose MOFS: a unified multimodal framework for multi-operator few-shot learning, which aims to generalize to unseen PDE operators using only a few demonstration examples. Our method integrates three key components: (i) multi-task self-supervised pretraining of a shared Fourier Neural Operator (FNO) encoder to reconstruct masked spatial fields and predict frequency spectra, (ii) text-conditioned operator embeddings derived from statistical summaries of input-output fields, and (iii) memory-augmented multimodal prompting with gated fusion and cross-modal gradient-based attention. We adopt a…
Peer Reviews
Decision·Submitted to ICLR 2026
- Very innovative multi-modal approach - Strong performance is achieved on the benchmark PDEs considered.
- The comparisons with other models, such as DeepOnet or FNO, are a bit unclear; do we consider a similar number of parameters for these models ? - Not much infration is given regarding how expensive the various phases of training are, in particular the contrastive learning. - The text conditioning is hard coded, and could possibly hinder the generalization of the method to more complex PDEs.
Exploring operator learning techniques which can generalize via few-shot learning across different PDEs is an important area of research. Using pretraining approaches for such methods is a promising way forward to more efficient and broadly applicable models for the fast numerical surrogate solutions for PDEs.
The paper is unclear overall in its presentation of the method and gives little motivation for each of the many components. The use of a language model to encode the statistics of the samples for each PDE dataset feels particularly unmotivated. What is the additional benefit of a natural language representation of these scalar statistics compared to dealing with them directly? The paper writes that this captures both "physical statistics and linguistic priors." It is unclear what a linguisti
1. The paper is easy to follow. 2. The attempt to use textual statistics as priors and align them with spectral/visual features is interesting and novel within the operator-learning literature.
1. Lack of conceptual novelty. - Most components—FNO encoder, contrastive learning, text conditioning, memory-based prompting—are directly adapted from existing architectures (e.g., FNO, CLIP, Flamingo) with minimal innovation specific to operator learning. The paper does not articulate why multimodal fusion or text embeddings are theoretically beneficial for PDEs beyond empirical combination. 2. Weak empirical reuslts. - Although large and diverse public benchmarks such as PDEBench,
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
