Resolving Extreme Data Scarcity by Explicit Physics Integration: An Application to Groundwater Heat Transport

Julia Pelzer; Corn\'e Verburg; Alexander Heinlein; Miriam Schulte

arXiv:2507.06062·cs.LG·February 3, 2026

Resolving Extreme Data Scarcity by Explicit Physics Integration: An Application to Groundwater Heat Transport

Julia Pelzer, Corn\'e Verburg, Alexander Heinlein, Miriam Schulte

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces a physics-informed neural network approach that effectively models groundwater heat transport in data-scarce scenarios by combining numerical models with CNNs, enabling accurate predictions with minimal training data.

Contribution

The authors develop a Local-Global CNN that integrates physics-based models with deep learning to address data scarcity in complex advection-diffusion problems.

Findings

01

LGCNN generalizes well to larger domains

02

Effective with fewer than five training simulations

03

Successfully applied to real subsurface data from Munich

Abstract

Real-world flow applications in complex scientific and engineering domains, such as geosciences, challenge classical simulation methods due to large spatial domains, high spatio-temporal resolution requirements, and potentially strong material heterogeneities that lead to ill-conditioning and long runtimes. While machine learning-based surrogate models can reduce computational cost, they typically rely on large training datasets that are often unavailable in practice. To address data-scarce settings, we revisit the structure of advection-diffusion problems and decompose them into multiscale processes of locally and globally dominated components, separating spatially localized interactions and long-range effects. We propose a Local-Global Convolutional Neural Network (LGCNN) that combines a lightweight numerical model for global transport with two convolutional neural networks addressing…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

1) Well-motivated physics factorization. The advection-dominated regime (Pe ≫ 1) motivates separating global transport from local effects, with streamlines serving as an informative, compact conditioning signal. This reduces the burden on CNN receptive fields and training data. 2) Data efficiency under scarcity. Matching UNet/DDUNet trained on ~73–101 datapoints using only 1–3 simulations for LGCNN in critical steps is impressive and practically important for scientific settings with expensive

Weaknesses

1) Pipeline error propagation & calibration. The full pipeline’s error notably increases vs. Step-3-isolated results (using simulated v). There’s limited analysis of where velocity errors matter most (e.g., bifurcations), how sensitive temperature predictions are to streamline integration tolerances, or whether outputs are calibrated (e.g., reliability vs PAT/SSIM). A quantitative uncertainty or sensitivity analysis is missing. 2) Global surrogate design choices. The IVP solver and 2D raster e

Reviewer 02Rating 4Confidence 2

Strengths

- Results sound promising: the model seems to learn and generalize well from few training samples. - Models and data will be available after publication for reference. - The architecture and general idea of physical formulas inside/between CNN seem interesting

Weaknesses

- Comparisons are lacking: as far as I can tell the proposed LGCNN is never compared directly to other state of the art solutions or even non-deep learning solutions, this makes it difficult to judge how useful LGCNN is in practice. - Also, the novelty is not entirely clear: physics informed neural networks are known and applied to a variety of domains, what makes LGCNN special in this regard? Could LGCNN be compared to other, established, physics informed neural network architectures? Comparis

Reviewer 03Rating 2Confidence 4

Strengths

- The paper is motivated by a real and relevant application - The idea of combining ML and classical methods to speed up simulations is promising - The experiments are based on both real and synthetic datasets

Weaknesses

- The contributions of the paper to ML are not clear: the paper focuses on a narrow application and proposes a simple solution integrating CNNs and a numerical solver. It is not clear how the proposed approach will lead to advances in ML for simulations. Moreover, the paper doesn’t contextualize the work within the large related literature on simulations recently published at ICLR, NeurIPS, ICML, AAAI, IJCAI, etc. A quick look at the citations shows that the paper is much more focused on the app

Reviewer 04Rating 2Confidence 4

Strengths

The operator splitting concept is sensible - recognising that CNNs struggle with long-range advective transport and handling it with a numerical surrogate is the right intuition. The physics decomposition is clean.

Weaknesses

1. Training on 3 datapoints (literally 1 train, 1 val, 1 test) and claiming "strongly reduced data requirements" doesn't pass the smell test. Even with the 101-sample comparison, there's no proper cross-validation or statistical validation. 2. You cite this work, Pelzer et al. (2024), which describes essentially the same two-stage idea (numerical surrogate for global transport + CNN for local processes). What exactly is new here beyond that paper? The three-step breakdown feels like an impleme

Code & Models

Repositories

corne00/ddunetforheatplumeprediction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Geothermal Energy Systems and Applications · Integrated Energy Systems Optimization

MethodsDiffusion