Elliptic PDE learning is provably data-efficient

Nicolas Boull\'e; Diana Halikias; Alex Townsend

arXiv:2302.12888·cs.LG·September 20, 2023

Elliptic PDE learning is provably data-efficient

Nicolas Boull\'e, Diana Halikias, Alex Townsend

PDF

Open Access 1 Repo

TL;DR

This paper provides the first theoretical guarantees for data efficiency in elliptic PDE learning, demonstrating an exponentially fast convergence rate with high probability.

Contribution

It introduces a provably data-efficient algorithm for learning solution operators of 3D elliptic PDEs using randomized linear algebra and PDE theory.

Findings

01

Achieves exponential convergence rate of error with respect to training data size

02

Provides high-probability guarantees on data efficiency

03

Demonstrates theoretical bounds for PDE learning performance

Abstract

PDE learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input-output training pairs required in PDE learning. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of 3D uniformly elliptic PDEs from input-output data and achieves an exponential convergence rate of the error with respect to the size of the training dataset with an exceptionally high probability of success.

Equations8

L u = - \nabla \cdot (A (x) \nabla u) = f, x \in Ω, u ∣_{\partial Ω} = 0,

L u = - \nabla \cdot (A (x) \nabla u) = f, x \in Ω, u ∣_{\partial Ω} = 0,

F (f) = \int_{Ω} G (x, y) f (y) d y,

F (f) = \int_{Ω} G (x, y) f (y) d y,

∥ F - \tilde{F} ∥_{2} \leq ϵ ∥ F ∥_{HS},

∥ F - \tilde{F} ∥_{2} \leq ϵ ∥ F ∥_{HS},

∥ F - \tilde{F} ∥_{2} \leq ϵ ∥ F ∥_{HS},

∥ F - \tilde{F} ∥_{2} \leq ϵ ∥ F ∥_{HS},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nboulle/pde-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Seismic Imaging and Inversion Techniques · Reservoir Engineering and Simulation Methods

Full text

\templatetype

pnasbriefreport

\leadauthorBoullé \authorcontributionsAuthor contributions: N.B., D.H., and A.T. designed research; performed research; analyzed data; and wrote the paper. \authordeclarationThe authors declare no competing interest. \correspondingauthor1To whom correspondence should be addressed. Email: [email protected].

Elliptic PDE learning is provably data-efficient

Nicolas Boullé

Isaac Newton Institute for Mathematical Sciences, University of Cambridge, Cambridge CB3 0EH, United Kingdom

Diana Halikias

Mathematics Department, Cornell University, Ithaca, NY 14853-4201, United States

Alex Townsend

Mathematics Department, Cornell University, Ithaca, NY 14853-4201, United States

Abstract

PDE learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input-output training pairs required in PDE learning. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of 3D uniformly elliptic PDEs from input-output data and achieves an exponential convergence rate of the error with respect to the size of the training dataset with an exceptionally high probability of success.

keywords:

deep learning $|$ inverse problems $|$ sample complexity $|$ neural operators

doi:

\dates

This manuscript was compiled on .

\dropcap

Many scientific breakthroughs have come from deriving new partial differential equations (PDEs) from first principles to model real-world phenomena and simulating them on a computer to make predictions. However, many crucial problems currently lack an adequate mathematical formulation. It is not clear how to derive PDEs to describe how turbulence sheds off the wing of a hypersonic aircraft, how E. coli bacteria swim in unison to form an active fluid, or how atomic particles behave with long-range interactions. Rather than working from first principles, scientists are now looking to derive PDEs from real-world data using deep learning techniques (1).

The success of deep learning in language models, visual object recognition, and drug discovery is well known (2). The emerging field of PDE learning hopes to extend this to discovering new physical laws by supplying deep learning models with experimental or observational data (1, 3). PDE learning commonly seeks to recover features such as symmetries, conservation laws, solution operators, and the parameters of a family of hypothesized PDEs. In most deep-learning applications, a large amount of data is needed, which is often unrealistic in engineering and biology. However, PDE learning can be shockingly data-efficient in practice (4). In particular, surprisingly little data is used to learn the solution operator, which maps the forcing term to the solution of the PDE.

In this paper, we provide a theoretical explanation of this behavior by showing that, for $\epsilon>0$ sufficiently small, one can recover an $\epsilon$ -approximation to the solution operator of a three-dimensional (3D) elliptic PDE with a training dataset of size about $\mathcal{O}(\log^{5}(1/\epsilon))$ . Elliptic PDEs, such as the steady state heat equation, are ubiquitous in physics and model diffusion phenomena. Solution operators can produce surrogate data for data-intensive machine learning approaches such as learning reduced order models for design optimization in engineering, uncovering physics in climate models, and PDE recovery (1).

To illustrate the observed data-efficiency of PDE learning, we compare the performance of three techniques (5, 4, 6) for recovering the solution operator associated with the 2D Poisson equation in Fig. 1. We vary the size of the training dataset, consisting of random forcing terms and corresponding solutions obtained by a numerical solver. We then evaluate the accuracy of the predicted solutions on a testing dataset with new forcing terms. The three methods are based on deep learning and differ in their neural network architectures. While the Fourier Neural Operator (5) exploits the fast Fourier transform for computationally efficient training, DeepONet (4) and GreenLearning (6) achieve a faster convergence rate on small training datasets. Here, DeepONet employs a complex network architecture with many parameters. In contrast, GreenLearning leverages prior knowledge that the solution operator is an integral operator and the approximation power of rational neural networks (7). Green’s function learning is observed to be the most data-efficient in Fig. 1, as for a fixed training dataset size, it achieves the smallest testing error. All methods plateau due to discretization errors, and the training procedure gets stuck in a local minimum of the loss landscape rather than finding the global minimum. The rapid decay of testing errors prior to the plateau motivates our main result.

There is a lack of understanding for the efficiency of PDE learning methods with limited training data (4). This work provides theoretical insights by constructing a provably data-efficient algorithm, showing that one can achieve exponential convergence when learning solution operators of elliptic PDEs.

Consider an unknown uniformly elliptic PDE in three dimensions, defined on a bounded domain $\Omega\subset\mathbb{R}^{3}$ with Lipschitz smooth boundary, with variable coefficients of the form:

[TABLE]

where the coefficient matrix $A$ has bounded coefficient functions and is symmetric positive definite for all $x\in\Omega$ . The weak assumptions on $\Omega$ and $A(x)$ allow for corner singularities and low regularity of the coefficients. The training data consists of pairs of random forcing terms $f_{1},\ldots,f_{N}$ and corresponding solutions $u_{1},\ldots,u_{N}$ such that $\mathcal{L}u_{j}=f_{j}$ for $1\leq j\leq N$ . Deep learning techniques use this data to predict solutions to Eq. 1 at new forcing terms by recovering the action of the solution operator $\mathscr{F}$ , which is given by

[TABLE]

where $G$ is the associated Green’s function. For example, we visualize in Fig. 2A the Green’s function associated with the 1D Poisson equation. The random forcing terms in the training dataset are sampled from a Gaussian process (GP), i.e., they follow a multivariate Gaussian distribution when sampled on a grid, and the covariance kernel determines the correlation between the function’s entries and its smoothness.

Recent work (11) proves that for any $\epsilon>0$ and 3D elliptic PDEs, a large number of input-output training pairs of size about $\mathcal{O}(\epsilon^{-6})$ is sufficient to recover an $\epsilon$ -approximation $\tilde{\mathscr{F}}$ to $\mathscr{F}$ such that

[TABLE]

where $\|\cdot\|_{2}$ is the solution operator norm and $\|\cdot\|_{\text{HS}}$ is the Hilbert–Schmidt norm. Once the $\epsilon$ -approximation to $\mathscr{F}$ has been constructed, $\tilde{\mathscr{F}}$ can be used to study the stability and regularity of solutions of the PDE. For example, to see whether small perturbations of the input function lead to small changes in the output solution, or whether the solution has certain smoothness or decay properties for all forcing terms. Moreover, $\tilde{\mathscr{F}}$ can be used in numerical methods for approximating the solution of the PDE. By discretizing the input function and applying $\tilde{\mathscr{F}}$ as a surrogate for $\mathscr{F}$ , one can obtain a numerical solution of the PDE that approximates the true solution. The integral kernel associated with the Hilbert–Schmidt operator $\tilde{\mathscr{F}}$ is also of interest, as it is an approximation to the Green’s function, which can be exploited to recover linear conservation laws, symmetries, boundary effects, and dominant modes (6).

Our main result dramatically improves the required amount of training data to construct an $\epsilon$ -approximation to $\mathscr{F}$ by exploiting the hierarchical structure of $G$ (9) and randomized linear algebra techniques (10, 12). We derive a randomized algorithm that provably succeeds with exceptionally high probability and needs a training dataset size of only $\mathcal{O}(\log(1/\epsilon)^{5}[\log(\log(1/\epsilon))+\log(1/\Gamma_{\epsilon})]^{4})$ input-output pairs.

Theorem 1.

Let $\epsilon>0$ be sufficiently small, and $\mathscr{F}$ be the solution operator associated with a 3D uniformly elliptic PDE of the form in Eq. 1. There exists a randomized algorithm that constructs an $\epsilon$ -approximation $\tilde{\mathscr{F}}$ to $\mathscr{F}$ such that

[TABLE]

using $\mathcal{O}(\log(1/\epsilon)^{5}[\log(\log(1/\epsilon))+\log(1/\Gamma_{\epsilon})]^{4})$ input-output pairs with probability $\geq 1-e^{-\log(1/\epsilon)^{3}}$ .

The main contribution of Theorem 1 is a theoretical upper bound on the amount of training data required in elliptic PDE learning problems, which should deepen our understanding of existing deep learning techniques. Hence, the exponential convergence rate in Theorem 1 matches the one observed in the deep learning experiments of Fig. 1A. We believe that this learning rate is near-optimal, as it exploits the multi-scale structure of Green’s functions (see Fig. 2B,C) and depends on the training dataset. The factor $0<\Gamma_{\epsilon}\leq 1$ measures the quality of the training dataset at probing the dominant modes of the PDE, and a technical definition is available in the SI Appendix. We emphasize that the error bound must include a factor that quantifies the quality of the training dataset. If the forcing terms are too smooth, then $\Gamma_{\epsilon}$ is small. In contrast, choosing the covariance kernel of the GP such that the sampled functions are oscillatory usually ensures that $\Gamma_{\epsilon}$ is reasonable for learning $G$ . In short, a small number of sufficiently diverse forcing terms is required (see Fig. 2D).

The algorithm constructed in the proof of Theorem 1 achieves an approximation error measured in the solution operator norm. This mimics the typical measurement of accuracy of PDE learning techniques by comparing true and predicted solutions on a testing dataset of square-integrable forcing terms. Additionally, Theorem 1 employs random input-output pairs, where the forcing terms are sampled from a GP, so there is always some probability of failure. Fortunately, we show this probability is exceptionally small. For $\epsilon<10^{-3}$ , failure is a once-in-a-cosmic-epoch event (see Fig. 2F).

Theorem 1 is challenging to prove, and the whole argument is in the SI Appendix. The proof relies on the fact that the solution operator associated with a 3D elliptic PDE is an integral operator in the form of Eq. 2. Firstly, the Green’s functions related to 3D elliptic operators are square-integrable and have a bounded decay rate away from the diagonal of $\Omega\times\Omega$ (13). Secondly, they possess a hierarchical structure (9) in the sense that they have rapidly decaying singular values when restricted to off-diagonal parts of the domain (green blocks in Fig. 2B). We leverage the hierarchical structure, which has been historically exploited by fast solvers, in a data-driven context where the PDE is unknown. Combining these properties enables a generalization of the randomized SVD (10) known as the peeling algorithm (8) to simultaneously learn the off-diagonal blocks at any level of the hierarchy.

While the peeling algorithm is traditionally used to recover hierarchical matrices efficiently from matrix-vector products, we generalize it to approximate infinite-dimensional integral operators. To do so, we leverage insights from recent work that extends the peeling algorithm to arbitrary hierarchical partitions and dimensions (14). This gives us a strategy to recover the Green’s function level-by-level. However, proving the stability of peeling is an open question in numerical linear algebra. This is because the approximation errors from one level can potentially accumulate exponentially at later levels, thus degrading the convergence rate (11, 8).

We overcome this theoretical obstacle in the infinite-dimensional context by requiring an adaptive approximation accuracy at each level of the hierarchy. The peeling algorithm ensures that the large-scale features of a Green’s function are first learned to high accuracy by the randomized SVD. Then, we progressively decrease the accuracy requirement at subsequent levels, ensuring an overall $\epsilon$ -approximation on each level of the partition at the end (see Fig. 2E). The rapidly decaying singular values of the Green’s function on off-diagonal parts of the domain (see Fig. 2C) enable us to maintain a near-optimal exponential convergence rate with respect to the size of the training dataset. We then construct a global $\epsilon$ -approximant by neglecting $G$ near the diagonal of the domain.

As one usually employs deep learning techniques to learn solution operators, our theoretical contributions can also lead to practical benefits. We believe that future training datasets benefit from taking into account prior knowledge of the PDE to improve the quality of the forcing terms at learning the solution operator. Similar ideas have already been employed in the field of visual object recognition through data-augmentation techniques. There is also an opportunity to design neural network architectures with hierarchical structures to capture the long-range interactions in PDE models. Finally, enforcing a different accuracy at different scales might improve the computational efficiency of existing PDE learning approaches.

In summary, we constructed a randomized algorithm that provably achieves an exponential convergence rate for approximating the solution operator associated with 3D elliptic PDEs in terms of the size of the training dataset. This provides a theoretical explanation for the observed performance of recent deep learning techniques in PDE learning. The proof techniques can be adapted to include elliptic PDEs in any dimension and time-dependent PDEs (15). Recovering solution operators associated with hyperbolic PDEs, like wave equations, remains a significant open challenge. Moving forward, we plan to ramp up PDE learning techniques to handle noisy experimental data, deal with data from emerging transient dynamics, and enforce conservation laws onto our solutions.

Data availability

All data and codes used in this article are publicly available on GitHub at https://github.com/NBoulle/pde-learning. The proof of Theorem 1 and details of the numerical experiments are available in the SI Appendix.

\acknow

This work is supported by National Science Foundation grants DMS-1952757, DMS-2045646, and DGE-2139899. N.B. was supported by an INI-Simons Postdoctoral Research Fellowship.

\showacknow

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) GE Karniadakis, et al., Physics-informed machine learning. \Journal Title Nat. Rev. Phys. 3 , 422–440 (2021).
2(2) Y Le Cun, Y Bengio, G Hinton, Deep learning. \Journal Title Nature 521 , 436–444 (2015).
3(3) M Raissi, A Yazdani, GE Karniadakis, Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. \Journal Title Science 367 , 1026–1030 (2020).
4(4) L Lu, P Jin, G Pang, Z Zhang, GE Karniadakis, Learning nonlinear operators via Deep O Net based on the universal approximation theorem of operators. \Journal Title Nat. Mach. Intell. 3 , 218–229 (2021).
5(5) Z Li, et al., Fourier Neural Operator for Parametric Partial Differential Equations in ICLR . (2021).
6(6) N Boullé, CJ Earls, A Townsend, Data-driven discovery of Green’s functions with human-understandable deep learning. \Journal Title Sci. Rep. 12 , 1–9 (2022).
7(7) N Boullé, Y Nakatsukasa, A Townsend, Rational neural networks in Neur IPS . Vol. 33, pp. 14243–14253 (2020).
8(8) L Lin, J Lu, L Ying, Fast construction of hierarchical matrix representation from matrix-vector multiplication. \Journal Title J. Comput. Phys. 230 , 4071–4087 (2011).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Elliptic PDE learning is provably data-efficient

Abstract

keywords:

doi:

Theorem 1**.**

Data availability

Theorem 1.