Multi-Operator Few-Shot Learning for Generalization Across PDE Families

Yile Li; Shandian Zhe

arXiv:2508.01211·cs.LG·August 5, 2025

Multi-Operator Few-Shot Learning for Generalization Across PDE Families

Yile Li, Shandian Zhe

PDF

Open Access 3 Reviews

TL;DR

This paper introduces MOFS, a multimodal framework enabling neural operators to generalize to unseen PDEs with minimal data, combining self-supervised pretraining, text-conditioned embeddings, and multimodal prompting.

Contribution

The work presents a novel multi-operator few-shot learning approach that integrates multimodal data and contrastive fine-tuning for PDE operator generalization.

Findings

01

Outperforms existing methods in few-shot PDE operator learning

02

Effective use of multimodal prompts improves generalization

03

Validated on Darcy Flow and Navier Stokes benchmarks

Abstract

Learning solution operators for partial differential equations (PDEs) has become a foundational task in scientific machine learning. However, existing neural operator methods require abundant training data for each specific PDE and lack the ability to generalize across PDE families. In this work, we propose MOFS: a unified multimodal framework for multi-operator few-shot learning, which aims to generalize to unseen PDE operators using only a few demonstration examples. Our method integrates three key components: (i) multi-task self-supervised pretraining of a shared Fourier Neural Operator (FNO) encoder to reconstruct masked spatial fields and predict frequency spectra, (ii) text-conditioned operator embeddings derived from statistical summaries of input-output fields, and (iii) memory-augmented multimodal prompting with gated fusion and cross-modal gradient-based attention. We adopt a…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 2

Strengths

- Very innovative multi-modal approach - Strong performance is achieved on the benchmark PDEs considered.

Weaknesses

- The comparisons with other models, such as DeepOnet or FNO, are a bit unclear; do we consider a similar number of parameters for these models ? - Not much infration is given regarding how expensive the various phases of training are, in particular the contrastive learning. - The text conditioning is hard coded, and could possibly hinder the generalization of the method to more complex PDEs.

Reviewer 02Rating 2Confidence 4

Strengths

Exploring operator learning techniques which can generalize via few-shot learning across different PDEs is an important area of research. Using pretraining approaches for such methods is a promising way forward to more efficient and broadly applicable models for the fast numerical surrogate solutions for PDEs.

Weaknesses

The paper is unclear overall in its presentation of the method and gives little motivation for each of the many components. The use of a language model to encode the statistics of the samples for each PDE dataset feels particularly unmotivated. What is the additional benefit of a natural language representation of these scalar statistics compared to dealing with them directly? The paper writes that this captures both "physical statistics and linguistic priors." It is unclear what a linguisti

Reviewer 03Rating 2Confidence 3

Strengths

1. The paper is easy to follow. 2. The attempt to use textual statistics as priors and align them with spectral/visual features is interesting and novel within the operator-learning literature.

Weaknesses

1. Lack of conceptual novelty. - Most components—FNO encoder, contrastive learning, text conditioning, memory-based prompting—are directly adapted from existing architectures (e.g., FNO, CLIP, Flamingo) with minimal innovation specific to operator learning. The paper does not articulate why multimodal fusion or text embeddings are theoretically beneficial for PDEs beyond empirical combination. 2. Weak empirical reuslts. - Although large and diverse public benchmarks such as PDEBench,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning