Data-Augmented Few-Shot Neural Emulator for Computer-Model System Identification

Sanket Jantre; Deepak Akhare; Zhiyuan Wang; Xiaoning Qian; and Nathan M. Urban

arXiv:2508.19441·cs.LG·September 26, 2025

Data-Augmented Few-Shot Neural Emulator for Computer-Model System Identification

Sanket Jantre, Deepak Akhare, Zhiyuan Wang, Xiaoning Qian, and Nathan M. Urban

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a data-augmentation method for training neural PDEs that improves sample efficiency and generalization by space-filling sampling of local states, outperforming traditional emulators.

Contribution

It proposes a novel data-augmentation strategy for neural PDE training that reduces redundancy and enhances accuracy with limited simulation data.

Findings

01

Neural PDEs trained on augmented data outperform those trained on trajectory data.

02

The method achieves high accuracy with only 10 simulation steps of data.

03

Augmented data improves long-horizon stability and generalization.

Abstract

Partial differential equations (PDEs) underpin the modeling of many natural and engineered systems. It can be convenient to express such models as neural PDEs rather than using traditional numerical PDE solvers by replacing part or all of the PDE's governing equations with a neural network representation. Neural PDEs are often easier to differentiate, linearize, reduce, or use for uncertainty quantification than the original numerical solver. They are usually trained on solution trajectories obtained by long-horizon rollout of the PDE solver. Here we propose a more sample-efficient data-augmentation strategy for generating neural PDE training data from a computer model by space-filling sampling of local "stencil" states. This approach removes a large degree of spatiotemporal redundancy present in trajectory data and oversamples states that may be rarely visited but help the neural PDE…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. Problem motivation is solid: Long rollouts of PDE solvers are expensive and often contain massive redundancy. The idea to cut this redundancy by working directly with stencil-level updates is both intuitive and practical. 2. Technical formulation is clear and well-scoped. The NSE learns discretized RHS mappings in function space, rather than full-field transitions. This leads to much lower model complexity and dramatically higher effective sample size. 3. The paper evaluates across multiple

Weaknesses

1. The method relies on learned approximations of discretized RHS terms, but there is no discussion on convergence guarantees (either for training or for rollout error accumulation). Would have been helpful to see bounds, or at least qualitative analysis on failure modes. 2. The entire setup assumes access to clean simulator outputs and perfect labels. It is unclear how well the NSE would perform in a setting where the simulation is imperfect, noisy, or partially observed. 3. The NSE does not

Reviewer 02Rating 4Confidence 4

Strengths

The paper is well-motivated and well-written. The proposed methods are novel, and the experiments are performed thoroughly. I appreciate that the authors compared against several different surrogate modeling paradigms, such as PINNs and neural operators.

Weaknesses

While the proposed method’s results are impressive, there are a few areas of improvement. I appreciate the authors’ comparisons with FNO, U-Net, and PINNs, but there have been many recent advancements in these architectures for neural surrogate modeling. To ensure the fairest comparisons, it would be interesting to see the proposed method being directly applied to an existing scientific machine learning benchmark (if the current datasets are not already taken from such a benchmark). This would e

Reviewer 03Rating 2Confidence 4

Strengths

- In general, the paper is well written and the augmentation strategies are discussed well. - Learning a local neural network approximation is interesting and not conventionally done. - The work addresses a problem that a lot of PDE data is redundant, which is a real problem.

Weaknesses

### Major Concerns - The discussed baselines (FNO/Unet) should be reported using the same explicit time integrator as well. From prior work, it is known that for simple, 2D PDEs that are discretized finely in time (in your case you use 1000 timesteps), training networks to predict the current time derivative and using a temporal integrator is more effective (https://www.sciencedirect.com/science/article/pii/S0045782525002622). Reporting both the base Unet/FNO and the derivative Unet/FNO would

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems · Real-time simulation and control systems