Variational system identification of the partial differential equations governing pattern-forming physics: Inference under varying fidelity and noise
Zhenlin Wang, Xun Huan, Krishna Garikipati

TL;DR
This paper introduces a variational system identification method for PDEs that models pattern formation physics, accounting for data noise and fidelity, to determine the most accurate mathematical description of complex biological and material systems.
Contribution
It advances PDE system identification by integrating variational frameworks and statistical tests, improving robustness under data noise and varying fidelity conditions.
Findings
Effective identification of PDEs from noisy data
Robustness to data fidelity variations demonstrated
Improved model selection accuracy over previous methods
Abstract
We present a contribution to the field of system identification of partial differential equations (PDEs), with emphasis on discerning between competing mathematical models of pattern-forming physics. The motivation comes from developmental biology, where pattern formation is central to the development of any multicellular organism, and from materials physics, where phase transitions similarly lead to microstructure. In both these fields there is a collection of nonlinear, parabolic PDEs that, over suitable parameter intervals and regimes of physics, can resolve the patterns or microstructures with comparable fidelity. This observation frames the question of which PDE best describes the data at hand. This question is particularly compelling because identification of the closest representation to the true PDE, while constrained by the functional spaces considered relative to the data at…
| 1 | 40 | 0.1 | -1 | 1 | 0.9 | -1 | 0.1 | 0.1 | 0.1 | 10 | 10 | 0.4 | 0.7 |
| type of basis | basis in weak form |
|---|---|
| non-gradient | |
| boundary condition |
| mesh size | results | error |
|---|---|---|
| 0% | ||
| 0% | ||
| 1.15% | ||
| 0.56% | ||
| 5.14% | ||
| 1.83% | ||
| 13.9% | ||
| 4.83% |
| mesh size | results | error |
|---|---|---|
| 0% | ||
| 0% | ||
| 30% | ||
| 0.57% | ||
| 47.1% | ||
| 1.87% | ||
| 17% | ||
| 5% |
| mesh size | results | error |
|---|---|---|
| 0% | ||
| 0% | ||
| 1.01% | ||
| 0.92% | ||
| 3.9% | ||
| 4.9% | ||
| 24% | ||
| 30% | ||
| mesh size | results | error |
|---|---|---|
| 0% | ||
| 0% | ||
| 1.2% | ||
| 1.14% | ||
| 3.28% | ||
| 4.58% | ||
| 20.6% | ||
| 14.7% | ||
| mesh size | results | error |
|---|---|---|
| 4.05% | ||
| identify wrongly | N/A | |
| 1.42% | ||
| 53% | ||
| 5.15% | ||
| 6.19% | ||
| 13.8% | ||
| 4.3% |
| mesh size | results | error |
|---|---|---|
| 1.8% | ||
| identify wrongly | N/A | |
| 30.3% | ||
| identify wrongly | N/A | |
| 47% | ||
| 7.3% | ||
| 17.22% | ||
| 4.38% |
| mesh size | results | error |
| identify wrongly | N/A | |
| identify wrongly | N/A | |
| identify wrongly | N/A | |
| identify wrongly | N/A | |
| 30.2% | ||
| 30.3% | ||
| 24.1% | ||
| 30.3% | ||
| mesh size | results | error |
| identify wrongly | N/A | |
| 1.065% | ||
| identify wrongly | N/A | |
| 1.066% | ||
| 9.17% | ||
| 4.57% | ||
| 20.7% | ||
| 14.7% | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Variational system identification of the partial differential equations governing the physics of pattern-formation: Inference under varying fidelity and noise
Z. Wang, X. Huan and K. Garikipati Department of Mechanical Engineering, University of MichiganDepartment of Mechanical Engineering, Michigan Institute for Computational Discovery & Engineering, University of Michigan, University of MichiganDepartments of Mechanical Engineering, and Mathematics, Michigan Institute for Computational Discovery & Engineering, University of Michigan, corresponding author, [email protected]
Abstract
We present a contribution to the field of system identification of partial differential equations (PDEs), with emphasis on discerning between competing mathematical models of pattern-forming physics. The motivation comes from developmental biology, where pattern formation is central to the development of any multicellular organism, and from materials physics, where phase transitions similarly lead to microstructure. In both these fields there is a collection of nonlinear, parabolic PDEs that, over suitable parameter intervals and regimes of physics, can resolve the patterns or microstructures with comparable fidelity. This observation frames the question of which PDE best describes the data at hand. This question is particularly compelling because identification of the closest representation to the true PDE, while constrained by the functional spaces considered relative to the data at hand, immediately delivers insights to the physics underlying the systems. While building on recent work that uses stepwise regression, we present advances that leverage the variational framework and statistical tests. We also address the influences of variable fidelity and noise in the data.
1 Introduction
The widespread use of sensors, high throughput experiments and simulations, as well high performance computing have made “big data” available for a range of engineering physics systems. This has sparked an explosion of interest in data-driven modeling for these systems. An extreme manifestation of this still-developing field is seen in “model-free” approaches that do not rely on physics-based knowledge. Such a path to modelling holds the promise of very high computational efficiency by circumventing solutions to potentially complex physics. However, it draws criticism for its “black box” nature, and often for lack of interpretability. More seriously, these approaches offer scant openings for analysis when the model fails. Conversely, the availability of abundant data also presents opportunities to discover mathematical frameworks, with well-understood physical meaning, which govern the underlying behavior. This has fueled the field of system identification, which is particularly compelling for determining the governing partial differential equations (PDEs). This is so, because knowledge of the PDE directly translates to deep insights to the physics, guided by differential and integral calculus.
Approaches for data-assisted and data-driven discovery of PDEs have proceeded along several fronts. (a) Early work in parameter identification can be traced to nonlinear regression approaches [1, 2]. (b) If the governing equation is unknown, an approximate model—whether physical, empirical, or mixed—could be learned from data. For example, the approximate model could be a neural network [3], a reduced-order model [4, 5], a phenomenological effective model formed by a linear combinations of basis functions in finite dimensions [6, 7], or a linear mapping from a current to a future state where the dynamic characteristics of the mapping could be learned by Dynamic Mode Decomposition [8, 9]. (c) Lastly, the complete underlying governing equations could be extracted from data by combining symbolic regression and genetic programming to infer algebraic expressions along with their coefficients [10, 11]. In this context, while genetic programming balances model accuracy and complexity, it proves very expensive when searching for a few relevant terms from a large library of candidates. In a recent approach to solving inverse problems [12], the strong form of a specified PDE was directly embedded in the loss function while training deep neural network representations of the solution variable. This approach results in the minimum-residual solution subject to a deep neural network global basis. The work contained examples where up to two unknown PDE coefficients were estimated, and did not explore system identification from a larger set of possible operators. A somewhat more ambitious and extensive approach is based on the recognition that most physical systems have only a few relevant terms, thus creating an opportunity to develop sparse regression techniques for system identification. This class of approaches has recently been applied with success [13, 14, 15] to determining the operators in PDEs from a comprehensive library of candidates, thus efficiently discovering governing equations from data. More recent extensions have been applied to model recovery using sparse data after abrupt changes in the system [16], hybrid dynamical systems [17] and multiscale systems [18]. The work presented here is based on this approach, inspired by its direct treatment of operator forms in PDEs, and their connections with physical mechanisms, while presenting several novel advances. These include the adoption of the variational setting, via the weak forms of PDEs; step-wise regression with the statistical F-test; a treatment of noise and its amelioration by varying fidelity.
For context, we briefly discuss the role of pattern forming systems of equations in developmental biology and materials physics. Following Alan Turing’s seminal work on reaction-diffusion systems [19], a robust literature has developed on the application of nonlinear versions of this class of PDEs to model pattern formation in developmental biology [20, 21, 22, 23, 24, 25, 26, 27, 28]. The Cahn-Hilliard phase field equation [29] has been applied to model other biological processes with evolving fronts, such as tumor growth and angiogenesis [30, 31, 32, 33, 34, 35, 36, 37]. Pattern formation during phase transformations in materials physics can happen as the result of instability-induced bifurcations from a uniform composition [38, 39, 40], which was the original setting of the Cahn-Hilliard treatment. Another phase field treatment, the Allen-Cahn equation [41] models nucleation and growth of precipitates and also has seen wide use [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. All these pattern forming systems fall into the class of nonlinear, parabolic PDEs, and have spawned vast literature in mathematical physics. They can be written as systems of first-order dynamics driven by a number of time-independent terms of algebraic and differential form. The spatio-temporal, differentio-algebraic operators act on either a composition (normalized concentration) or an order parameter. It also is common for the algebraic and differential terms to be coupled across multiple species.
It is compelling to attempt to discover the physics governing these pattern forming systems by identifying their PDE forms from data. However proper evaluation of the numerical derivatives is very challenging. The commonly used finite-difference approximations require data to be available in a very finely discretized space. Polynomial interpolation has proven to be reliable and robust [53], and recently has found use in data-driven discovery of PDEs [15]. However, this approach encounters substantial difficulties in estimating the coefficients of high-order derivatives, because of the significant error associated with numerical differentiation. Even still, to the best of our knowledge, the published literature in discovering governing equations has remained focused on trying to identify the strong form of the PDEs. Every strong form, however, has equivalent weak form(s), which seem to have not been exploited for system identification. As will be seen in this communication, the advantages of using weak forms include: the natural occurrence of the loss function for the regression problem, allowing identification of the boundary conditions, which has proven to be a challenge, and successfully identifying higher-order derivatives with the aid of smooth basis functions (such as non-uniform rational B-splines or NURBS in this work).
Here, after briefly reviewing how the Galerkin weak form may be employed for identification of a general dynamical system (Section 1.1), we focus on introducing our methods for identifying PDEs in this variational setting, using stepwise regression (Section 2). In this first communication, we focus on examples to identify reaction-diffusion equations evolving under Schnakenberg kinetics [54], and the Cahn-Hilliard and Allen-Cahn equations driven by non-convex energy density functions (Section 3). This choice is driven by the fact that, as explained above, these are widely used models in biological patterning and morphogenesis, and have well-established traditions of use in materials physics, also. The datasets we use are all obtained by high fidelity direct numerical simulations. The more challenging problem of datasets from physical experiments will be addressed in a subsequent communication. We do call attention to our treatment of both high and low fidelity data, with and without noise, as a precursor to using data from physical experiments. Concluding remarks appear under Discussion and Conclusions (Section 4).
We note in passing that, very recently, there has been growing interest in combining data-driven methods with PDEs in a more direct manner: to find solutions to the canonical initial and boundary value problems (IBVPs) of mathematical physics. These approaches include Gaussian processes with kernels defined by specific PDEs in linearized form for one- and two-dimensions [55], and deep neural networks for solution of high-dimensional IBVPs on semi-linear parabolic PDEs [56] or with strong forms of the target one- and two-dimensional PDEs embedded in the loss functions [12].
1.1 The Galerkin weak form
We first provide a brief discussion of how the weak form of PDEs will be used in this work. We start with the general strong form for first-order dynamics written as
[TABLE]
where is a vector containing all possible independent terms expressed as algebraic and differential operators on the scalar solution :
[TABLE]
and is the vector of pre-factors for each term. Thus, the one-field diffusion reaction equation
[TABLE]
with constant diffusivity and reaction rate has
[TABLE]
Note that the time derivative is treated separately from the other terms in order to highlight the first-order dynamics of all problems we target in this work.
Next, we recall the weak form that is equivalent to a general strong form written as above with vector of operators and vector of pre-factors . For infinite-dimensional problems with Dirichlet boundary conditions on the weak form can be stated as: where , find where such that
[TABLE]
For finite-dimensional fields , the weak form is as follows: find where , such that where , the finite-dimensional (Galerkin) weak form of the problem is satisfied. The choice of as the Sobolev space is motivated by the differential operators, which reach the highest order of two in the weak forms we consider (four in strong form). The variations and trial solutions are defined component-wise using a finite number of basis functions,
[TABLE]
where is the dimensionality of the function spaces and , and represents the basis functions. Substituting and in Equation (5) by Equations (6) and (7) and decomposing the integration over by a sum over subdomains leads to:
[TABLE]
Upon integration by parts, application of appropriate boundary conditions, and accounting for the arbitrariness of in , the finite-dimensionality leads to a system of residual equations for each degree of freedom (DOF):
[TABLE]
where is the component of the residual vector, a functional of satisfying for the Galerkin solution of the problem, .
2 Identification of governing parabolic PDEs in weak form
The PDE identification problem in strong form is to find the correct pre-factor in Equation (1), which chooses among many possible candidate basis operators acting on collected in , given the solution/data of the dynamical system. Alternately, we could identify the PDE in finite-dimensional weak form by adopting finite-dimensional basis functions and also finding for many possible basis operators, but now written in weak form, acting on . In preparation, we first recall several basis operators, including Neumman boundary condition operators, all written in weak form. We next introduce NURBS basis functions, which have been popularized for isogeometric analysis [57]. The algorithm of stepwise regression for model selection is introduced following these developments.
2.1 Candidate basis operators in weak form
Suppose we have data for the field on certain degrees of freedom (DOFs) of a discretization for a series of time steps. These DOFs are the control variables if NURBS basis functions are used. Selecting all or a subset of these DOFs, a Galerkin representation of the field could be constructed over the domain. The use of fewer DOFs represents lower fidelity data, which is discussed separately in Section 2.4. We can then obtain the system of residual equations in (9), with the independent candidate basis terms in weak form populating . Below we illustrate the procedure with four example terms.
Constant term :
[TABLE]
Since the are arbitrary, we obtain a vector of size , where the th component is
[TABLE]
Time derivative at time : We write a backward difference approximation of the time derivative at as
[TABLE]
where is the time step, and and are the control variable values at times and .
Laplace operator at time : Multiplying by the weighting function and integrating by parts:
[TABLE]
Note that the second term on the right is the Neumann boundary term. For a flux boundary condition on , such that -\nabla C^{h}_{n}\cdot\mbox{\boldmathn}=1. We thus have two basis functions after accounting for the arbitrariness of :
[TABLE]
is the basis for (with zero Neumann boundary condition), and
[TABLE]
is the basis for the Neumann boundary term arising from at the surface . As is well known, in the variational setting the weak form of the divergence operator yields volume (15) and surface (16) terms, with the latter corresponding to the Neumann boundary condition. In this work we treat them as distinct bases, thus allowing us to separately identify the Neumann boundary term.
We also call attention to the fact that for the unsteady diffusion problem, equating (12) to the sum of (15) and (16) amounts to the Backward Euler time integration algorithm, which is first-order accurate. The mid-point rule, on the other hand is second-order in time.
Biharmonic operator at time : Multiplying by the weighting function and integrating by parts:
[TABLE]
Since the are arbitrary, we define the first term
[TABLE]
as the basis for the biharmonic operator (with zero Neumann boundary condition). The last two terms in (17) are, respectively, a higher-order Dirichlet boundary condition (as emerges from variational calculus applied in the context of the full equation [39]) and the Neumman boundary condition, from which we define:
[TABLE]
The second-order gradients in the Equation (18) require the solutions and basis functions to lie in , while the Lagrange polynomial basis functions traditionally used in finite element analysis only lie in . We therefore draw the basis functions, , from the family of Non-Uniform Rational B-Splines (NURBS), and adopt Isogeomeric Analysis (IGA) in our simulations to find the solutions in . A discussion of the NURBS basis and IGA is beyond the scope of this paper; interested readers are directed to the original works on this topic [57]. We briefly present the construction of the NURBS basis functions in the following subsection.
2.2 NURBS basis functions
Similar to Lagrange polynomial basis functions traditionally used in the Finite Element Method (FEM), NURBS basis functions are partitions of unity with compact support, satisfy affine covariance (i.e., an affine transformation of the basis is obtained by the affine transformation of its nodes/control points), and support an isoparametric formulation, thereby making them suitable for a Galerkin framework. They enjoy advantages over Lagrange polynomial basis functions in being able to ensure -continuity, possessing the positive basis and convex hull properties, and being variation diminishing.
The building blocks of the NURBS basis functions are univariate B-spline functions defined as follows. Consider two positive integers and , and a non-decreasing sequence of values , where is the polynomial order, is the number of basis functions, are coordinates in the parametric space referred to as knots (equivalent to nodes in FEM), and is the knot vector. The B-spline basis functions are defined starting with the zeroth order basis
[TABLE]
and higher orders using the Cox-de Boor recursive formula for [58]
[TABLE]
The knot vector divides the parametric space into intervals referred to as knot spans (equivalent to elements in FEM). A B-spline basis function is -continuous inside knot spans and -continuous at the knots. If an interior knot value repeats, it is referred to as a multiple knot. At a knot of multiplicity , the continuity is . Using a quadratic B-spline basis (Figure (1)), a -continuous one dimensional NURBS basis can now be constructed:
[TABLE]
where are the weights associated with each of the B-spline functions. In higher dimensions, NURBS basis functions are constructed as a tensor product of their one-dimensional counterparts:
[TABLE]
Using NURBS basis functions, we are able to smoothly evaluate the higher order derivatives of the data by interpolating them from the DOF control variable values. This is crucial in enabling the identification of higher order derivatives (e.g., the second order derivatives that arise in the weak form of the Cahn-Hilliard equations) using the stepwise regression methods described next.
2.3 Identification of basis operators via stepwise regression
To identify a dynamical system, we need to generate all possible basis operators acting on the solution, compute the non-zero pre-factor for each basis operator that is in the model (relevant bases), while also attaining pre-factors of zero for the the basis operators that are not in the model (irrelevant bases). For situations where the true model is not contained within the set of candidate basis operators, our method then finds the optimal model (optimal in the sense of the loss function to be defined shortly) out of the set of all possible models. The resolution of such situations requires investigations into model inadequacy/misspecification (e.g., discussed in [59]), which is a challenging problem that we do not attempt to tackle here; instead, we limit our scope to work within a given set of candidate operators. We begin by putting together the time derivative basis operators at each time . Let
[TABLE]
be our target vector. Note that each element \mbox{\boldmath\Xi}^{\dot{C}}|_{n}\in\mathbb{R}^{N_{DOF}} in (33) is a vector formed in Equation (12):
[TABLE]
Likewise we can form the matrix, , containing all possible terms:
[TABLE]
The residual accounting for all DOFs and time steps can then be expressed as
[TABLE]
The solution gives
[TABLE]
Note that the target vector in (40) and matrix in (46) are currently structured to contain all DOFs. However we can always exclude some DOFs by deleting their corresponding rows. Additional attention is needed if one wishes to eliminate an entire basis operator. For instance, doing so to the Neumann boundary basis requires deletion of the corresponding column. The reduced Equation (48) would then represent data with the homogeneous Neumann boundary condition.
If Equation (48) were directly solved for as an ordinary least squares regression problem, the analytic solution is
[TABLE]
where a nontrivial leads to a non-trivial solution to . The least squares formulation can also be interpreted as an optimization problem with the loss function
[TABLE]
which is simply the squared Euclidean norm of the residual vector obtained from the weak form. The system identification problem can then be viewed as finding that minimizes the loss:
[TABLE]
If the linear system in Equation (48) is free of error, the loss function and the prefactors for irrelevant terms can all be driven down to machine zero. For a linear system containing error terms (further discussed in Section 2.4), however, performing a standard regression will result in a solution of with nonzero contributions in all component of this vector. Such a solution of the system identification will not sharply delineate the relevant bases. Additionally, we could apply a separate least squares regression for each possible combination of all bases and choose the best one [60].aaaA technique called best subset selection. However the problem of selecting the best model from among the possibilities grows exponentially in , the number of all candidate basis. A viable alternative is to select the relevant bases by shrinking the pre-factors towards zero for the irrelevant bases. Ridge regression and regression (also known as least absolute shrinkage and selection operator (LASSO), a popular problem in compressive sensing research) are two well-known techniques to induce regularization. The optimization problem Equation (51) is updated with additional penalty terms:
[TABLE]
where and are regularization parameters. For both methods, however, selecting a good regularization parameter is critical, and may need to leverage prior knowledge about the magnitude of pre-factors[60].
2.3.1 Stepwise regression
In this work, we use backward model selection by stepwise regression [60]. The algorithm is summarized below.
Algorithm for Model selection by Stepwise regression:
Step 0: Establish target vector and matrix of bases, .
Step 1: Solve \mbox{\boldmath\omega}^{i} in the linear regression problem, Equation (48) using ordinary least squares regression. Calculate the loss function at this iteration, .
Step 2: Eliminate basis terms in matrix by deleting their columns, using the criterion to be introduced below. Set to zero the corrpesonding components of \mbox{\boldmath\omega}^{i}. GOTO Step 1. Note that in this case the value of loss function remains small (), and the solution may be overfitted.
Step 3: The algorithm stops if the pre-specified criterion does not allow us to eliminate any more basis terms. Beyond this, the loss function increases dramatically for any further reduction.
There are several choices for the criterion for eliminating basis terms. Kutz and co-workers applied a hard threshold on the pre-factors in each iteration [13, 14, 15]. Since the results of sparse regression and sparsity profiles of vary with different values of the threshold, a different method is needed for the final solution corresponding to the optimal tolerance. In the work mentioned above, Kutz and co-workers used Pareto front analysis by cross-validation to find the optimal tolerance. The drawback of this approach, in our experience, is that its performance degrades if the pre-factors of the relevant basis operators differ significantly in magnitude. For the approach to perform well, a carefully chosen rescaling is needed for the basis operators along with cross-validation.
Here, we explore, as an alternative, a widely used statistical criterion called the -test [60], which can be motivated as follows: Since the model at iteration contains fewer bases than at iteration , and is therefore more restrictive, the loss function, in general, is higher at iteration than at iteration . In this sense, model always becomes “worse” after eliminating any basis. We want to determine whether the model at iteration is significantly worse than at iteration . If not, the basis could be eliminated. We can evaluate the significance of the change by the -test:
[TABLE]
where is the number of bases at iteration and is the total number of bases. The value essentially evaluates the increasing loss function under the consideration of model complexity. If does not exceed a pre-defined tolerance, , meaning that the model at iteration is not significantly worse than at iteration , we eliminate the considered basis. Using the -test in Step 2 of the above algorithm, this step can be re-stated as following:
Step 2 with -test:
Step 2.1:
Tentatively eliminate the basis corresponding to pre-factors in \mbox{\boldmath\omega}^{i} which are smaller than the pre-defined threshold, evaluate the value followed by ordinary least squares regression on the reduced bases set.
Step 2.2:
IF
THEN formally eliminate these bases in matrix , by deleting the corresponding columns. GOTO Step 1.
ELSE GOTO Step 2.1, and choose another basis.
In this work we defined the threshold to be:
[TABLE]
where is the smallest value in \mbox{\boldmath\omega}^{i}, and is a small tolerance, and chosen to be . This threshold will eliminate the basis with the smallest pre-factor, as well as all bases with pre-factors within of the smallest one. Note that in the -test is the only hyperparameter in the algorithm. System identification is significantly more challenging if greater numbers of candidate bases are retained in the first few iterations. A small value of may not eliminate all irrelevant bases at once, but can serve as a good initial value as it will filter some irrelevant bases, thus mollifying the problem to a degree. In this work, we choose initially, and increase it to 10 when no more basis can be eliminated for .bbbFor a different problem, the value can be rigorously chosen by cross-validation.. We have summarized the step-wise algorithm in Figure 2.
As mentioned previously, our approach builds on recent work in sparse regression [13, 14, 15]. However, it differs prominently from those papers in adopting a variational setting, using the -test for determining bases, and, as will be seen in Section 2.4, in considering the influence of data fidelity and noise.
More broadly, the purpose of the -test is to perform model selection: a criterion to judge whether the new candidate model with fewer operators is more suitable than the previous one. At the core, the comparison is made to balance the desires of (a) fitting the data and (b) reducing the model complexity. Indeed, model selection is a rich and active area of research, encompassing methods such as cross-validation [61, 62], Akaike information criterion (AIC) [63], Bayesian information criterion (BIC) [64], and Bayes factor [65, 66]. We will explore additional model selection techniques in the future, especially those that require a Bayesian version of our system identification framework.
2.4 Low fidelity and noisy data
In practice, we may not be able to collect high fidelity data in space. As illustrated in Figure 3, we consider the situation in which data is gathered at fewer locations in space, or equivalently from fewer instances in time. The low fidelity data yields larger element sizes when constructing the finite dimensional spatial bases, and larger time steps, for use in the weak form.
Furthermore, the data could be contaminated by background noise as well as data collection error, such that the observed data may be noisy. We model these imperfections using an additive form
[TABLE]
Consequently the target vector \widehat{\mbox{\boldmathy}} and every basis in \widehat{\mbox{\boldmath\Xi}} constructed from will contain error terms. For \widehat{\mbox{\boldmathy}}, we have
[TABLE]
We further assume that , the noise on , follows an independent and identically distributed (i.i.d.) Gaussian distributioncccWhile the observation noise may have systematic bias and correlation, we do not explore these possibilities in this paper, and instead take an initial step to assume an i.i.d. setting. with zero mean and standard deviation :
[TABLE]
Since linear combinations of mutually independent Gaussian random variables are also Gaussian, we have:
[TABLE]
where is a Gaussian distribution with zero mean and standard deviation . Since the integral is computed using quadrature, we have:
[TABLE]
where is the determinant of the Jacobin matrix of the element mapping from an isoparametric domain, and is the weight for the quadrature point . The first term on the right hand side reflects the noiseless (true) component of , and the second term is the error. the true term converges to the time derivative:
[TABLE]
however the error term scaled by the time step diverges as . It also indicates that the error due to noise decreases if larger time steps are used:
[TABLE]
For gradient operators, such as the Laplacian and the Biharmonic operator, we have
[TABLE]
[TABLE]
from which, since the forward solution converges with decreasing mesh size, , we have:
[TABLE]
However the error terms are scaled by in the noisy Laplacian operator and in the noisy Biharmonic operator, where the mesh size is introduced by the gradients of the basis functions. The error on is amplified by the mesh size acting through the spatial gradient operators and diverges as . Consequently, we have
[TABLE]
The above analysis illustrate that the final error induced by noise on will decrease with larger time steps and element sizes. In Section 3 we show that for noisy , using low fidelity data improves the results of system identification. For purely algebraic bases, i.e., without differential operators from time derivatives or spatial gradients, time step and element size do not affect the ratio of error to true value. Note also that the approximations in Equations (61), (65) and (66) degrades with increasing time step and element size, bringing in discretization errors.
3 Examples
We now turn to using the framework detailed in the preceding section to identify the parabolic PDEs that govern patterning. Instead of directly applying our algorithms to data from physical experiments, in this first communication, we test our methods on identifying PDEs from data obtained through high-fidelity, direct numerical simulations (DNS).
We consider test cases with the following four data-generating models, with their true coefficients summarized in Table 1:
Model 1:
[TABLE]
where is the domain boundary.
Model 2:
[TABLE]
where and are shown in Figure 5. Models 1 and 2 represent coupled diffusion-reaction equations for two species following Schnakenberg kinetics [54], but with different boundary conditions. For an activator-inhibitor species pair, these equations use auto-inhibition with cross-activation of a short range species, and auto-activation with cross-inhibition of a long range species to form so-called Turing patterns [19].
Model 3:
[TABLE]
where is a non-convex, “homogeneous” free energy density function, whose form has been chosen from [28]:
[TABLE]
Model 4:
[TABLE]
Model 3 is a two field Cahn-Hilliard system with the well-known fourth-order term in the concentration, and . The three-well non-convex free energy density function (See Figure 4), , drives segregation of the system into two distinct types. We have previously used this system to make connections with cell segregation in developmental biology [28].
The diffusion-reaction and Cahn-Hilliard equations are widely used in biological pattern generation, as discussed in the Introduction. The Cahn-Hilliad equation [29] also occupies a central role in the materials physics literature for modelling phase transformations developing from a uniform concentration field in the presence of an instability.
Model 4 is the coupled Cahn-Hilliard/Allen-Cahn equations system [41] where the field describes growth of alloy precipitates from nuclei created by a process of spinodal decomposition that controls the field . The free energy density couples these processes through and in both Models 3 and 4. The Introduction also cites some of the relevant literature on the use of Cahn-Hilliard and Allen-Cahn equations in materials physics. An important difference between Models 3 and 4 is that in Model 4 is a non-conserved order parameter, which defines the identities of the precipitate, described by composition field , and matrix phases. Both Cahn-Hilliard equations in Model 3 are conservative. Note that solving the Cahn-Hilliard and Allen-Cahn equations in the isogeometric analytic framework requires imposition of the higher-order Dirichlet boundary conditions (79) and (86). We use Nitsche’s method [67] to impose these boundary conditions weakly.
Substituting the parameter values from Table 1, we present the weak form of each model:
Model 1:
[TABLE]
Model 2:
[TABLE]
Model 3:
[TABLE]
Model 4:
[TABLE]
The higher order boundary conditions and stabilization terms in Nitsche’s methods are not shown here. We do not infer these terms since they are of a numerical rather than physical nature.
3.1 Data preparation
All computations have been implemented in the mechanoChemIGA framework, a library for modeling mechano-chemical problems using isogeometric analytics, available at https://github.com/mechanoChem/mechanoChem. The IBVPs presented here are two-dimensional. Data generation by DNS was carried out on uniformly discretized meshes. We start all Models from the same initial conditions obtained by running Model 1 for a very short time from a randomized state with small perturbations about (see Figure 5). This state was relabelled as .
With this as initial conditions, the early stage dynamics of Models 1-4 cannot be distinguished by eye (see Figure 6).
However driven by different boundary condition and/or governing equations, the dynamics of the four models soon evolve along distinct trajectories, and form completely different patterns over a longer time as shown in Figure 7.
The mesh yields the DNS data for each model, in terms of the fields , at the corresponding DOFs for each time step. The synthetic, high fidelity DNS data are regularized by having been obtained from PDEs with derivative constraints. Therefore, they are noise-free, which contributes to finding solutions that drive the residual equations (9) down to machine zero. However the data collected from physical experiments will not satisfy PDEs to the same precision, in general: The data could be at lower fidelity, having been collected at fewer spatial positions and time instants, and therefore may not approximate derivatives well. Background noise and collection error will further contribute to a poorer approximation by PDEs, as discussed in Section 2.4. To mimic the experimental data, we take two approaches: i) As illustrated in Figure 3, we use nested meshes for coarse-graining the data after generating them by DNS on the mesh. Doing so introduces a loss of information on derivatives. ii) We superpose noise with a Gaussian distribution having zero mean and standard deviation , on the collected . We first collect the DNS data, from all four models, on DOFs corresponding to the mesh, which was used for the forward computations. Data from the same forward computations were then collected on the DOFs corresponding only to the nested , and meshes. This yielded datasets with four different fidelities for each model, and 16 different datasets in total. We then superimpose noise with for data generated from Models 1 and 2, and for data generated from Models 3 and 4, yielding another 16 data sets. Having the clean and noisy data at different fidelity, we generate 38 candidate bases in addition to the time derivative terms, summarized in Table 2. Note that to construct the time derivative terms, we need data at two time steps, and . Data at is only used in the time derivative terms, while other bases are constructed using data at (Backward Euler integration).
3.2 System identification with data at different fidelity without noise
In order challenge our algorithms to distinguish the underlying PDEs from the simulation results, we first use only DNS results over certain time intervals that, to the eye, appear identical to each other. For this purpose, we first apply stepwise regression on the 16 data sets (four datasets of different fidelity from four models) without noise, generated at (Figure 5) and (Figure 6). For the four high fidelity data ( mesh), we found that in the first iteration, all the relevant bases have almost converged to the correct pre-factor, and irrelevant bases have trivial (approaching zero) pre-factors. By setting the tolerance in the threshold equation (55), all relevant bases are selected with exact pre-factors in the second iteration. The results are not shown here since the coefficients on operators are identical to those in Equations (90 - 95) for the four datasets respectively. In particular, our system identification algorithms correctly identify the zero flux boundary condition in Model 1, constant flux boundary condition on in Model 2, and higher order terms of the form (see Table 2), in Models 3 and 4. The algorithm terminates after the second iteration since no more bases can be eliminated by the test. As shown in Figure 8, for the first two iterations, the loss function remains machine zero as the synthetic data drive the residual equations (9) down to this limit. If any more bases are eliminated, the loss function increases dramatically in Iteration 3 due to underfitting.
Initially and are close to uniformly distributed with small perturbations. This uniformity yields bases, such as and that appear to be linearly dependent (left plot in Figure in 9). The high fidelity data ( mesh) drive the residual equations (9) down to machine zero, allowing the linear regression model Equation (48) to converge. The algorithms can therefore select the relevant bases, distinguishing from other irrelevant bases that are close to being linearly dependent. However, low fidelidy data using NURBS basis functions yields poor representation of derivatives such as those in Equations (61), (65) and (66). As a result, the linear regression model contains greater error using low fidelity data, which are created by directly sparsifying the high-fidelity data instead of being generated from a coarse grid. Therefore, an unresolvable discrepancy remains in the low-fidelity representation, and two terms that are nearly linearly dependent cannot be distinguished. Using low fidelity data at and , the algorithms are unable to identify any of the four governing PDEs (Models 1-4).
This spurious linearity however disappears as the system evolves (right plot in Figure 9).
By choosing data at 10 time steps: from with time step , the algorithms are able to correctly identify all PDEs (Models 1-4) using low fidelity data. To quantify the algorithm’s performance, we define the error of the estimated results to be:
[TABLE]
where is the estimated pre-factor, and is the true pre-factor. Figures 10 - 13 show the scaled results at the final iteration. The exact values and the error are summarized in Tables 3 - 6.
Note that the Neumman boundary condition terms only have contributions from DOFs on the boundary, which are sparse relative to the total number. Consequently the basis for the Neumman boundary condition has a negligible contribution to the linear regression model compared to operators that appear via volume integrals in the weak forms of the PDEs. This results in large errors of the pre-factor for the Neumann terms. As shown in the left panel of Figure 11, for the results using data generated from Model 2, the Neumman boundary condition operator is more poorly estimated than others. Using very low fidelity data ( mesh), the Neumman boundary condition basis fails to be be identified. However, the errors in the other bases are similar to those obtained when using data generated from Model 1, for which the Neumann boundary condition and the corresponding operator are zero.
The estimated results for data generated from Models 3 and 4, match the true values very well except for the lowest fidelity dataset using the mesh (shown in Figures 12 and 13). This lowest fidelity data set poses even greater challenges, because, to identify the governing PDE for , using the mesh, we need to place a bias on the basis to protect it from elimination in the first few iterations. Although we are able to select all relevant terms with this prior, the error increases significantly as the mesh cannot properly resolve the small variation of the basis as we see below in Figure 14.
Of the four datasets of different fidelity from each model, the high fidelity dataset ( mesh) yields exact pre-factors for all bases. However for the lower fidelity datasets, variations in the data field over a small neighborhood of the domain may diminish as the decreasing mesh size resolves the variation poorly, i.e. the approximation in Equations (65) and (66) becomes poor. The variations may even completely disappear, as seen on the mesh (see one example shown in Figure 14).
Therefore, all the results deteriorate when using lower fidelity data. Specifically, for data generated from Model 1, the error gradually increases for decreasing mesh size. Due to the small pre-factor for the constant term, the error in the governing equation of is bigger than that in the governing equation of .
3.3 System identification under noisy data
Using meshes of different fidelity, we superimpose noise with for data generated from Models 1 and 2, and for data generated from Models 3 and 4. The noise on will be amplified in the time derivative and spatial gradients as discussed in Section 2.4. Figure 15 shows the target vector, with two different time steps, and , on different meshes. At the smaller time step, , the noise, scaled by , washes out the true value. Decreasing the mesh size cannot alleviate the noise, as the ratio of error, denoted by , to its true value largely remains fixed. At the larger time step, , the noise is suppressed, as expressed in Equation (62). For the following analysis, We choose data at the same 10 time steps starting from , with time step .
Figure 16 shows the Laplacian basis operator on , i.e., , with data generated by Model 1 using different meshes.
For the fine mesh, the noise remains at a high level over an element, and washes out the true value. By decreasing the mesh size, the ratio of final error to true value decreases by a factor of as shown in Equation (67), and the true value begins to emerge in the basis generated from noisy data. Consequently, as shown in Figure 17, using high fidelity data (the mesh) we are unable to correctly identify the governing equation for using data generated by Model 1. Using lower fidelity data, however, we are able to identify all the relevant terms. The error also is smaller when using lower fidelity data. For the same reason, the governing equation for using data generated by Model 2 also cannot be identified using the and meshes, but becomes feasible for lower fidelity data (see Figure 18). On the other hand the error is small for the basis operator in both models. As a result, the estimation of the governing PDE for is comparable with that by using same data of the same fidelity, but without noise. Again, as seen in the previous section, the Neumman boundary condition cannot able be identified by using very low fidelity data on the mesh.
Figure 19 shows the biharmonic operator, using data generated by Model 3 with different meshes.dddThis field was computed by projecting the data on to a fourth-order NURBS basis. Similar to the Laplace operator, the true value of the basis is washed out as the noise overwhelms the high fidelity data on the fine meshes. By using low fidelity data, we are able to correctly identify the governing equations for and using data generated from Model 3 as shown in Figure 20. The results for data generated from Model 4 are mixed. The governing equation for is similar to that in Model 3 with the fourth-order biharmonic operator in the concentration. Similar to that case, and for the same reason of noise overwhelming the true data, Model 4 cannot be identified by using high fidelity data, but can be identified by using low fidelity data (see Figure 21). However, the governing equation for does not have the fourth-order biharmonic operator. Given the small error on , the identification of the governing PDE for is comparable with that by using same fidelity data, but without noise. Again, as seen in the last section, for the mesh, we need to place a bias on to protect it from elimination in the first few iterations for data generated from Models 3 and 4. Also note that the error increases significantly to with decreasing fidelity as shown in Table 10.
Finally for all results using noisy data, the errors are, in general, higher than those obtained by using same fidelity data, but without noise.
4 Discussion and conclusions
The development of patterns in biophysics and material physics are governed by a range of spatio-temporal PDEs. It is compelling to attempt to discover the analytic forms of these PDEs from data, because doing so immediately provides insight to the governing physics. System identification has been explored using the strong form of the PDEs [10, 11, 13, 14, 15], as discussed in the Introduction. However the equivalent weak forms, which can offer capabilities not achievable by strong forms, have remained under-exploited for system identification. The weak form transfers derivatives to the weighting function, and allows smooth evaluation of the remaining, possibly higher-order, derivatives when NURBS basis functions are employed. In our numerical experiments, NURBS basis functions allow the fourth-order terms in the Cahn-Hilliard and Allen-Cahn equations to be correctly identified with higher accuracy estimation of the pre-factors using data at different fidelity, with and without noise (results in Tables 5, 6, 9 and 10). Another advantage of writing the PDEs in weak form is that Neumann boundary conditions are explicitly included as surface integrals. During system identification, this class of boundary conditions can be, therefore, constructed along with other operators in the PDEs. By incorporating these terms in the regression model, Equation (48), identification of boundary conditions becomes feasible (results in Tables 4 and 8).
Intuitively high fidelity data is favored as it minimizes the residual equations, and can accurately evaluate the time derivative and spatial gradient. For data without noise, the results enjoy these advantages (results in Tables 3, 4, 5 and 6 and Figures 10 - 13). However, the error embedded in noisy data is exaggerated relative to the true signal by small time steps and element lengths, as discussed in the context of Equations (61), (63) and (64). Low fidelity data in time and space may allow greater variations in the true signal relative to the noise, as shown in Figures 15, 16 and 19, and thus improve the results of system identification (results in Tables 7, 8, 9 and 10). On the other hand, low fidelity data may yield inaccurate estimation of the pre-factors due to poor evaluation of the operators (Figure 14). Data of very low fidelity may be incapable of capturing the high-frequency content of solution fields, thus leading to entirely incorrect identification of the operators. For example, without an injection of prior knowledge, we were unable to identify the high frequency operators such as in Model 3 and 4 using low fidelity data on the 50 50 mesh. The careful selection of data could improve these results, but that direction for study lies beyond the scope of this first communication.
We have demonstrated our algorithm on data at different fidelity with and without noise, generated from three different types of PDEs. In this first communication, we do not aim to systematically study the uncertainty in the models induced by the lack of fidelity and noise, nor to optimize the data extraction from simulations/experiments. However we point out several factors that could influence the results:
Two analytically independent bases may appear to be linearly dependent over certain time intervals (Figure 9). This spurious linearity could disappear as the system evolves. 2. 2.
Low fidelity data on coarse meshes resolves the variation of fields poorly (see Figure 14). However it helps by allowing larger variation of the true signal relative to the noise. It thus can effectively smooth out the error in bases constructed from noisy data (Figures 16 and 19). 3. 3.
Fields may evolve at different rates, and be distributed with different gradients (Figures 14 and 15 ). As a result a chosen basis may be more prominent at certain DOFs than at others, and contribute more in the linear regression model. Using DOFs over subdomains may improve the results of system identification. Using partial data in system identification has been seen to reduce the computational expense [15], but atempting to systematically improve the results by choosing partial data at certain times has not been widely explored. 4. 4.
We found that results improve when the Neumman boundary condition bases are not included. We could correctly identify the governing equation generated from Model 1 with larger noise by not including the boundary condition bases. We suspect that since the Neumman boundary condition basis is zero at all inner DOFs it renders the matrix of bases, , close to singular. 5. 5.
Any prior knowledge of the system would help to select the correct basis from many candidates. For example we placed a bias on certain bases when identifying PDEs using data generated from Model 3 with low fidelity data. Similarly, knowing that Model 3 is a conserved system, all bases contributing to the reaction (source) terms can be pre-eliminated.
We note that our treatment is finite-dimensional by being discretization-based, similar to previous work that has used the finite-difference method for representing operators [13, 14, 15, 16, 17, 18]. A different, neural network-based approach [12] enforced PDE constraints at random collocation points in one or two spatial dimensions and in time, thus effectively introducing a finite-dimensional representation with the global basis induced by activation functions. We also note that the neural network approach, while illustrated for learning coefficients in a predefined PDE, remains to be tested for discovery of PDE structures by downselection from a large set of operators, which is central to the treatment in this work and other recent approaches [13, 14, 15, 16, 17, 18].
Finally, we have presented system identification of parabolic PDEs in two dimensions. This is not a very serious restriction for the targeted physics of pattern formation in materials and biophysics. The preponderance of data in these fields is also restricted to two dimensions. Apart from an increase in computational cost, as the set of differential operators increases under the curse of dimensionality, there are no fundamental limitations. Even in this regard, the physics itself restricts the set of three-dimensional operators actually admissible in PDEs. Such classical knowledge could be leveraged to sparsify the operator basis set in three dimensions.
In this work, we currently use standard linear regression models for each iteration of the step-wise regression algorithm. Bayesian frameworks present a natural path to systematically inject prior knowledge and domain expertise, and to quantify and propagate the uncertainty induced by lack of model fidelity and data noise. Furthermore, they enable techniques for comparing competing models that are rooted in a rigorous probabilistic paradigm, such as with the Bayesian model selection and model averaging methods. Bayesian analysis of our methods will appear in future communications.
Acknowledgements
We acknowledge the support of Toyota Research Institute, Award #849910, “Computational framework for data-driven, predictive, multi-scale and multi-physics modeling of battery materials” (ZW and KG). Additional support: This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR0011199002, “Artificial Intelligence guided multi-scale multi-physics framework for discovering complex emergent materials phenomena” (ZW, XH and KG). Simulations in this work were performed using the Extreme Science and Engineering Discovery Environment (XSEDE) Stampede2 at the Texas Advance Computing Center through allocations TG-MSS160003 and TG-DMR180072. XSEDE is supported by National Science Foundation grant number ACI-1548562.
Appendix
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Voss and Kurths [1997] H. Voss and J. Kurths. Reconstruction of nonlinear time delay models from data by the use of optimal transformations. Phys. Lett. A. , 234:336–344, 1997.
- 2Voss et al. [1999] H. U. Voss, P. Kolodner, M. Abel, and J. Kurths. Amplitude equations from spatiotemporal binary-fluid convection data. Phys. Rev. Lett. , 83, 1999.
- 3González-García et al. [1998] R. González-García, R. Rico-Martínez, and I. G. Kevrekidis. Identification of distributed parameter systems: A neural net based approach. Computers them. Engng. , 22, 1998.
- 4Attar and Dowell [2005] P. J. Attar and E. H. Dowell. A reduced order system id approach to the modelling ofnonlinear structural behavior in aeroelasticity. J Fluids Struct. , 21, 2005.
- 5Khalil et al. [2007] M. Khalil, S. Adhikari, and A. Sarkar. Linear system identification using proper orthogonal decomposition. Mech Syst Signal Process. , 21, 2007.
- 6Guo et al. [2010] L. Z. Guo, S. A. Billings, and D. Coca. Identification of partial differential equation models for a class of multiscale spatio-temporal dynamical systems. Int. J. Control. , 83, 2010.
- 7Daniels and Nemenman [2015] B. C. Daniels and L. Nemenman. Automated adaptive inference of phenomenological dynamical models. Nat. Commun. , 6, 2015.
- 8C. W. Rowley et al. [2009] I. Mezić C. W. Rowley, S. Bagheri, P. Schlatter, and D. S. Henningson. Spectral analysis of nonlinear flows. J. Fluid Mech. , 641, 2009.
