Pre-Generating Multi-Difficulty PDE Data for Few-Shot Neural PDE Solvers

Naman Choudhary; Vedant Singh; Ameet Talwalkar; Nicholas Matthew Boffi; Mikhail Khodak; Tanya Marwah

arXiv:2512.00564·cs.LG·January 26, 2026

Pre-Generating Multi-Difficulty PDE Data for Few-Shot Neural PDE Solvers

Naman Choudhary, Vedant Singh, Ameet Talwalkar, Nicholas Matthew Boffi, Mikhail Khodak, Tanya Marwah

PDF

Open Access 1 Datasets 3 Reviews

TL;DR

This paper demonstrates that pre-generating multi-difficulty PDE data and combining it effectively can significantly reduce the computational cost of training neural PDE solvers, especially for complex physics problems.

Contribution

It introduces a method for pre-generating PDE data at multiple difficulty levels to improve neural solver training efficiency and accuracy.

Findings

01

Pre-generating low and medium difficulty data aids learning high-difficulty physics.

02

Combining data from multiple difficulty levels reduces pre-generation compute by 8.9x.

03

Principled data curation across difficulty levels enhances neural PDE solver performance.

Abstract

A key aspect of learned partial differential equation (PDE) solvers is that the main cost often comes from generating training data with classical solvers rather than learning the model itself. Another is that there are clear axes of difficulty--e.g., more complex geometries and higher Reynolds numbers--along which problems become (1) harder for classical solvers and thus (2) more likely to benefit from neural speedups. Towards addressing this chicken-and-egg challenge, we study difficulty transfer on 2D incompressible Navier-Stokes, systematically varying task complexity along geometry (number and placement of obstacles), physics (Reynolds number), and their combination. Similar to how it is possible to spend compute to pre-train foundation models and improve their performance on downstream tasks, we find that by classically solving (analogously pre-generating) many low and medium…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- This paper uses multiple SOTA Neural Operators (CNO, FFNO, and Poseidon variants) to verify the training effectiveness of different data difficulty distributions within the dataset. - The paper conducts extensive experiments across various scenarios, including fixed total sample size, fixed hard sample size, and few-shot downstream tasks to validate its conclusions.

Weaknesses

See Questions

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper tackles an important and underexplored problem: how to allocate data-generation effort across different difficulty levels in neural PDE solver training. 2. The study is thorough and methodologically sound, with carefully controlled experiments along geometry, physics, and combined difficulty axes. 3. Results are consistent across multiple model families (CNO, FFNO, Poseidon), reinforcing the robustness of the findings. 4. The proposed idea of difficulty transfer and the notion of

Weaknesses

1. The work is entirely empirical and lacks a theoretical explanation or analytical framework for the observed difficulty transfer phenomenon. It does not explain why medium-difficulty data generalize better than easy data toward harder regimes. 2. The experimental scope is somewhat limited. All results are based on 2D incompressible Navier–Stokes simulations. It remains unclear whether the conclusions hold for other PDE families or multi-physics problems. 3. The evaluation focuses mainly on n

Reviewer 03Rating 2Confidence 4

Strengths

- The paper is overall well written, with clear motivation and reasonable structure. - The problem studied is important for efficiently building practical neural PDE solvers. - The experiments clearly shows the increased computational cost with increased difficulty settings.

Weaknesses

My major concerns about this paper are regarding its contributions. - From my perspective, it is intuitively accurate and well known that mixing data with different levels of difficulty facilitates modeling training. Some existing papers have already analyzed the transfer learning bahaviour of scientific machine learning foundation models, which show that model trained with a specific set of PDE coefficients can be more data-efficient than models trained from scratch. [1] I think this conclusio

Code & Models

Datasets

sage-lab/PreGen-NavierStokes-2D
dataset· 79 dl
79 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Advanced Graph Neural Networks · Machine Learning in Materials Science