Variational system identification of the partial differential equations   governing pattern-forming physics: Inference under varying fidelity and noise

Zhenlin Wang; Xun Huan; Krishna Garikipati

arXiv:1812.11285·physics.comp-ph·March 28, 2024

Variational system identification of the partial differential equations governing pattern-forming physics: Inference under varying fidelity and noise

Zhenlin Wang, Xun Huan, Krishna Garikipati

PDF

TL;DR

This paper introduces a variational system identification method for PDEs that models pattern formation physics, accounting for data noise and fidelity, to determine the most accurate mathematical description of complex biological and material systems.

Contribution

It advances PDE system identification by integrating variational frameworks and statistical tests, improving robustness under data noise and varying fidelity conditions.

Findings

01

Effective identification of PDEs from noisy data

02

Robustness to data fidelity variations demonstrated

03

Improved model selection accuracy over previous methods

Abstract

We present a contribution to the field of system identification of partial differential equations (PDEs), with emphasis on discerning between competing mathematical models of pattern-forming physics. The motivation comes from developmental biology, where pattern formation is central to the development of any multicellular organism, and from materials physics, where phase transitions similarly lead to microstructure. In both these fields there is a collection of nonlinear, parabolic PDEs that, over suitable parameter intervals and regimes of physics, can resolve the patterns or microstructures with comparable fidelity. This observation frames the question of which PDE best describes the data at hand. This question is particularly compelling because identification of the closest representation to the true PDE, while constrained by the functional spaces considered relative to the data at…

Tables10

Table 1. Table 1: parameters used in the simulations.

$D_{1}$	$D_{2}$	$R_{10}$	$R_{11}$	$R_{13}$	$R_{20}$	$R_{21}$	$j_{n}$	$M_{1}$	$M_{2}$	$k_{1}$	$k_{2}$	$d$	$s$
1	40	0.1	-1	1	0.9	-1	0.1	0.1	0.1	10	10	0.4	0.7

Table 2. Table 2: Candidate basis for model selection. Asterisks, ( ∗ ∗ \ast ) in the left column represent algebraic operators on C 1 subscript 𝐶 1 C_{1} and C 2 subscript 𝐶 2 C_{2} .

type of basis	basis in weak form
$𝒚$	$\int_{Ω} w \frac{\partial C_{1}}{\partial t} d v \int_{Ω} w \frac{\partial C_{2}}{\partial t} d v$
$\nabla (* \nabla C_{1})$	$\int_{Ω} \nabla w \nabla C_{1} d v \int_{Ω} \nabla w C_{1} \nabla C_{1} d v \int_{Ω} \nabla w \nabla C_{2} d v$
	$\int_{Ω} \nabla w C_{1}^{2} \nabla C_{1} d v \int_{Ω} \nabla w C_{1} C_{2} \nabla C_{1} d v \int_{Ω} \nabla w C_{2}^{2} \nabla C_{1} d v$
	$\int_{Ω} \nabla w C_{1}^{3} \nabla C_{1} d v \int_{Ω} \nabla w C_{1}^{2} C_{2} \nabla C_{1} d v \int_{Ω} \nabla w C_{1} C_{2}^{2} \nabla C_{1} d v \int_{Ω} \nabla w C_{2}^{3} \nabla C_{1} d v$
$\nabla (* \nabla C_{2})$	$\int_{Ω} \nabla w \nabla C_{2} d v \int_{Ω} \nabla w C_{1} \nabla C_{2} d v \int_{Ω} \nabla w \nabla C_{2} d v$
	$\int_{Ω} \nabla w C_{1}^{2} \nabla C_{2} d v \int_{Ω} \nabla w C_{1} C_{2} \nabla C_{2} d v \int_{Ω} \nabla w C_{2}^{2} \nabla C_{2} d v$
	$\int_{Ω} \nabla w C_{1}^{3} \nabla C_{2} d v \int_{Ω} \nabla w C_{1}^{2} C_{2} \nabla C_{2} d v \int_{Ω} \nabla w C_{1} C_{2}^{2} \nabla C_{2} d v \int_{Ω} \nabla w C_{2}^{3} \nabla C_{2} d v$
$\nabla^{2} (* \nabla^{2} C)$	$\int_{Ω} \nabla^{2} w \nabla^{2} C_{1} d v \int_{Ω} \nabla^{2} w C_{1} \nabla^{2} C_{1} d v \int_{Ω} \nabla^{2} w \nabla^{2} C_{2} d v \int_{Ω} \nabla^{2} w C_{2} \nabla^{2} C_{2} d v$
non-gradient	$- \int_{Ω} w 1 d v - \int_{Ω} w C_{1} d v - \int_{Ω} w C_{2} d v$
	$- \int_{Ω} w C_{1}^{2} d v - \int_{Ω} w C_{1} C_{2} d v - \int_{Ω} w C_{2}^{2} d v$
	$- \int_{Ω} w C_{1}^{3} d v - \int_{Ω} w C_{1}^{2} C_{2} d v - \int_{Ω} w C_{1} C_{2}^{2} d v - \int_{Ω} w C_{2}^{3} d v$
boundary condition	$- \int_{Γ_{1}} w 1 d s - \int_{Γ_{4}} w 1 d s - \int_{Γ_{3}} w 1 d s - \int_{Γ_{4}} w 1 d s$

Table 3. Table 3: Results using data generated from Model 1 using Δ t = 1 Δ 𝑡 1 \Delta t=1 , with 10 snapshots.

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 1 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1 - C_{1} + 1 C_{1}^{2} C_{2}) d v$	0%
$400 \times 400$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 40 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.9 - 1 C_{1}^{2} C_{2}) d v$	0%
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9971 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.10116 - 0.9998 C_{1} + 0.9985 C_{1}^{2} C_{2}) d v$	1.15%
$200 \times 200$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 40.2205 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.89982 - 0.9998 C_{1}^{2} C_{2}) d v$	0.56%
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9805 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1051 - 0.9995 C_{1} + 0.9936 C_{1}^{2} C_{2}) d v$	5.14%
$100 \times 100$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 40.73 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (- 0.9019 - 1.0026 C_{1}^{2} C_{2}) d v$	1.83%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9489 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1138 - 1.0007 C_{1} + 0.9858 C_{1}^{2} C_{2}) d v$	13.9%
$50 \times 50$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v \int_{Ω} - 41.9339 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (- 0.9075 - 1.01 C_{1}^{2} C_{2}) d v$	4.83%

Table 4. Table 4: Results using data generating from Model 2 with Δ t = 1 Δ 𝑡 1 \Delta t=1 , with 10 snapshots. The high error in the governing equation of C 1 subscript 𝐶 1 C_{1} is induced by the Neumman boundary condition term, which cannot be identified on the 50 × 50 50 50 50\times 50 mesh.

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 1 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1 - C_{1} + 1 C_{1}^{2} C_{2}) d v + \int_{Γ_{2}} w_{1} 0.1 d s$	0%
$400 \times 400$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 40 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.9 - 1 C_{1}^{2} C_{2}) d v$	0%
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.982 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.106 - 1.0027 C_{1} + 0.997 C_{1}^{2} C_{2}) d v + \int_{Γ_{2}} w_{1} 0.1305 d s$	30%
$200 \times 200$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 40.23 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.90043 - 1.0006 C_{1}^{2} C_{2}) d v$	0.57%
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9793 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1095 - 1.0036 C_{1} + 0.9949 C_{1}^{2} C_{2}) d v + \int_{Γ_{2}} w_{1} 0.1471 d s$	47.1%
$100 \times 100$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 40.751 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.9028 - 1.0031 C_{1}^{2} C_{2}) d v$	1.87%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.884 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1172 - 0.98576 C_{1} + 0.9643 C_{1}^{2} C_{2}) d v$	17%
$50 \times 50$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 41.985 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.9076 - 1.0118 C_{1}^{2} C_{2}) d v$	5%

Table 5. Table 5: Results using data generated from Model 3 using Δ t = 1 Δ 𝑡 1 \Delta t=1 , with 10 snapshots. A bias on ∫ Ω ∇ w C 2 2 ∇ C 1 d v subscript Ω ∇ 𝑤 superscript subscript 𝐶 2 2 ∇ subscript 𝐶 1 d 𝑣 \int_{\Omega}\nabla wC_{2}^{2}\nabla C_{1}\text{d}v is needed to protect it from elimination in the first few iterations.

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 17.8126 + 47.98 C_{1} + 21.591 C_{2} - 47.98 C_{1}^{2} - 15.9933 C_{2}^{2}) \nabla C_{1} d v$	0%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.7955 + 21.591 C_{1} + 15.9933 C_{2} - 31.9867 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 1 \nabla^{2} w_{1} \nabla^{2} C_{1}$	0%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} \nabla w_{2} \cdot (- 10.7955 + 21.591 C_{1} + 15.9933 C_{2} - 31.9867 C_{1} C_{2}) \nabla C_{1} d v$	0%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 12.2149 + 15.9933 C_{1} + 42.3823 C_{2} - 15.9933 C_{1}^{2} - 47.98 C_{2}^{2}) \nabla^{2} C_{2} d v + \int_{Ω} - 1 \nabla^{2} w_{2} \nabla^{2} C_{2}$	0%
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 17.8498 + 48.07 C_{1} + 21.674 C_{2} - 48.07 C_{1}^{2} - 16.078 C_{2}^{2}) \nabla C_{1} d v$	1.01%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.789 + 21.5766 C_{1} + 15.9762 C_{2} - 31.9503 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 1.01 \nabla^{2} w_{1} \nabla^{2} C_{1}$	1.01%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} \nabla w_{2} \cdot (- 10.7845 + 21.567 C_{1} + 15.9752 C_{2} - 31.9493 C_{1} C_{2}) \nabla C_{1} d v$	0.92%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 12.2452 + 16.1062 C_{1} + 42.4017 C_{2} - 16.1108 C_{1}^{2} - 47.985 C_{2}^{2}) \nabla^{2} C_{2} d v + \int_{Ω} - 1.0093 \nabla^{2} w_{2} \nabla^{2} C_{2}$	0.92%
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 17.4653 + 46.9488 C_{1} + 21.445 C_{2} - 46.959 C_{1}^{2} - 16.0377 C_{2}^{2}) \nabla C_{1} d v$	3.9%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.4004 + 20.7943 C_{1} + 15.367 C_{2} - 30.7248 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 1.0103 \nabla^{2} w_{1} \nabla^{2} C_{1}$	3.9%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} \nabla w_{2} \cdot (- 10.2816 + 20.551 C_{1} + 15.2133 C_{2} - 30.4078 C_{1} C_{2}) \nabla C_{1} d v$	4.9%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 11.9343 + 16.0866 C_{1} + 40.8853 C_{2} - 16.1143 C_{1}^{2} - 46.1839 C_{2}^{2}) \nabla^{2} C_{2} d v + \int_{Ω} - 0.998 \nabla^{2} w_{2} \nabla^{2} C_{2}$	4.9%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 15.1267 + 40.1515 C_{1} + 19.6646 C_{2} - 40.1784 C_{1}^{2} \underline{- 15.2573} C_{2}^{2}) \nabla C_{1} d v$	24%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 8.3232 + 16.6306 C_{1} + 12.16 C_{2} - 24.3068 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 0.9249 \nabla^{2} w_{1} \nabla^{2} C_{1}$	24%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} \nabla w_{2} \cdot (- 7.6813 + 15.2688 C_{1} + 11.2396 C_{2} - 22.3164 C_{1} C_{2}) \nabla C_{1} d v$	30%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.187 + 15.361 C_{1} + 33.0694 C_{2} - 15.4857 C_{1}^{2} - 37.0552 C_{2}^{2}) \nabla^{2} C_{2} d v + \int_{Ω} - 0.8645 \nabla^{2} w_{2} \nabla^{2} C_{2}$	30%

Table 6. Table 6: Results using data generated from Model 4 using Δ t = 1 Δ 𝑡 1 \Delta t=1 , with 10 snapshots. A bias on ∫ Ω ∇ w C 2 2 ∇ C 1 d v subscript Ω ∇ 𝑤 superscript subscript 𝐶 2 2 ∇ subscript 𝐶 1 d 𝑣 \int_{\Omega}\nabla wC_{2}^{2}\nabla C_{1}\text{d}v is needed to protect it from elimination in the first few iterations.

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 17.8126 + 47.98 C_{1} + 21.591 C_{2} - 47.98 C_{1}^{2} - 15.9933 C_{2}^{2}) \nabla C_{1} d v$	0%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.7955 + 21.591 C_{1} + 15.9933 C_{2} - 31.9867 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 1 \nabla^{2} w_{1} \nabla^{2} C_{1}$	0%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.5085 - 10.7955 C_{1} - 12.2149 C_{2}$	0%
	$+ 10.7955 c_{1}^{2} + 15.9933 C_{1} C_{2} + 21.1912 C_{2}^{2} - 15.9933 C_{1}^{2} C_{2} - 15.9933 C_{2}^{3}) d v$	0%
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 17.8589 + 48.06 C_{1} + 21.7044 C_{2} - 48.0635 C_{1}^{2} - 16.09 C_{2}^{2}) \nabla C_{1} d v$	1.2%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.8207 + 21.64 C_{1} + 16.0478 C_{2} - 32.089 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 1.0123 \nabla^{2} w_{1} \nabla^{2} C_{1}$	1.2%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1.0114 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.498 - 10.7636 C_{1} - 12.191 C_{2}$	1.14%
	$+ 10.7663 C_{1}^{2} + 15.9459 C_{1} C_{2} + 21.1655 C_{2}^{2} - 15.9499 C_{1}^{2} C_{2} - 15.9752 C_{2}^{3}) d v$	1.14%
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 17.3435 + 46.4025 C_{1} + 21.4151 C_{2} - 46.4119 C_{1}^{2} - 15.9544 C_{2}^{2}) \nabla C_{1} d v$	3.28%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.4451 + 20.8839 C_{1} + 15.5847 C_{2} - 31.1354 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 1.0092 \nabla^{2} w_{1} \nabla^{2} C_{1}$	3.28%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1.0458 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.46 - 10.6416 C_{1} - 12.096 C_{2}$	4.58%
	$+ 10.6545 C_{1}^{2} + 15.7652 C_{1} C_{2} + 21.056 C_{2}^{2} - 15.7843 C_{1}^{2} C_{2} - 15.8966 C_{2}^{3}) d v$	4.58%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 14.9026 + 38.6896 C_{1} + 19.735 C_{2} - 38.689 C_{1}^{2} - \underline{14.989} C_{2}^{2}) \nabla C_{1} d v$	20.6%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 8.5697 + 17.1244 C_{1} + 13.1685 C_{2} - 26.231 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 0.9216 \nabla^{2} w_{1} \nabla^{2} C_{1}$	20.6%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1.1476 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.3323 - 10.2516 C_{1} - 11.7393 C_{2}$	14.7%
	$+ 10.3 c_{1}^{2} + 15.191 C_{1} C_{2} + 20.5589 C_{2}^{2} - 15.2618 C_{1}^{2} C_{2} - 15.5277 C_{2}^{3}) d v$	14.7%

Table 7. Table 7: Results using data generated from Model 1 using Δ t = 1 Δ 𝑡 1 \Delta t=1 , with 10 snapshots and σ = 10 − 4 𝜎 superscript 10 4 \sigma=10^{-4}

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9693 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1041 - 0.9991 C_{1} + 0.9941 C_{1}^{2} C_{2}) d v$	4.05%
$400 \times 400$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v$ identify wrongly	N/A
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9950 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1014 - 0.9998 C_{1} + 0.9981 C_{1}^{2} C_{2}) d v$	1.42%
$200 \times 200$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 18.6482 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.5309 - 0.4921 C_{1}^{2} C_{2}) d v$	53%
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9804 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1052 - 0.9995 C_{1} + 0.9936 C_{1}^{2} C_{2}) d v$	5.15%
$100 \times 100$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 37.9655 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.8473 - 0.9381 C_{1}^{2} C_{2}) d v$	6.19%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9488 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1139 - 1.0007 C_{1} + 0.9858 C_{1}^{2} C_{2}) d v$	13.8%
$50 \times 50$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 41.7368 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.90375 - 1.0057 C_{1}^{2} C_{2}) d v$	4.3%

Table 8. Table 8: Results using data generated from Model 2 using Δ = 1 Δ 1 \Delta=1 , with 10 snapshots and σ = 10 − 4 𝜎 superscript 10 4 \sigma=10^{-4} .

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9831 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1018 - 0.9983 C_{1} + 0.9958 C_{1}^{2} C_{2}) d v + \int_{Γ_{2}} w_{1} 0.0984 d s$	1.8%
$400 \times 400$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v$ identify wrongly	N/A
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.983 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1059 - 1.0028 C_{1} + 0.9973 C_{1}^{2} C_{2}) d v + \int_{Γ_{2}} w_{1} 0.1305 d s$	30.3%
$200 \times 200$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v$ identify wrongly	N/A
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.9794 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1094 - 1.0036 C_{1} + 0.9949 C_{1}^{2} C_{2}) d v + \int_{Γ_{2}} w_{1} 0.1472 d s$	47%
$100 \times 100$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 47.5561 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.8375 - 0.9269 C_{1}^{2} C_{2}) d v$	7.3%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} - 0.884 \nabla w_{1} \cdot \nabla C_{1} d v + \int_{Ω} w_{1} (0.1172 - 1.0036 C_{1} + 0.9856 C_{1}^{2} C_{2}) d v$	17.22%
$50 \times 50$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 41.755 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} w_{2} (0.903 - 1.0064 C_{1}^{2} C_{2}) d v$	4.38%

Table 9. Table 9: results using data generating from Model 3 using dt=1, 10 snapshot. σ = 10 − 5 𝜎 superscript 10 5 \sigma=10^{-5}

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v$ identify wrongly	N/A
$400 \times 400$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v$ identify wrongly	N/A
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v$ identify wrongly	N/A
$200 \times 200$	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v$ identify wrongly	N/A
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 13.7499 + 36.3723 C_{1} + 17.9356 C_{2} - 36.3702 C_{1}^{2} - 13.9 C_{2}^{2}) \nabla C_{1} d v$	30.2%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 7.6263 + 15.2494 C_{1} + 11.158 C_{2} - 22.32 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 0.7919 \nabla^{2} w_{1} \nabla^{2} C_{1}$	30.2%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} \nabla w_{2} \cdot (- 6.9381 + 13.8269 C_{1} + 10.1406 C_{2} - 20.2006 C_{1} C_{2}) \nabla C_{1} d v$	30.3%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 9.1114 + 13.4938 C_{1} + 29.8767 C_{2} - 13.5301 C_{1}^{2} - 33.5644 C_{2}^{2}) \nabla^{2} C_{2} d v + \int_{Ω} - 0.735 \nabla^{2} w_{2} \nabla^{2} C_{2}$	30.3%
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 15.1116 + 40.105 C_{1} + 19.6542 C_{2} - 40.132 C_{1}^{2} - \underline{15.2351} C_{2}^{2}) \nabla C_{1} d v$	24.1%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 8.3106 + 16.6052 C_{1} + 12.1407 C_{2} - 24.2681 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 0.9238 \nabla^{2} w_{1} \nabla^{2} C_{1}$	24.1%
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} \nabla w_{2} \cdot (- 7.6666 + 15.2396 C_{1} + 11.2172 C_{2} - 22.272 C_{1} C_{2}) \nabla C_{1} d v$	30.3%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 10.1748 + 15.3511 C_{1} + 33.0199 C_{2} - 15.4756 C_{1}^{2} - 36.9985 C_{2}^{2}) \nabla^{2} C_{2} d v + \int_{Ω} - 0.8631 \nabla^{2} w_{2} \nabla^{2} C_{2}$	30.3%

Table 10. Table 10: results using data generating from Model 4 using dt=1, 10 snapshot, σ = 10 − 5 𝜎 superscript 10 5 \sigma=10^{-5} .

mesh size	results	error
$400 \times 400$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v$ identify wrongly	N/A
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 0.9894 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.4733 - 10.6805 C_{1} - 12.1079 C_{2}$	1.065%
	$+ 10.6816 c_{1}^{2} + 15.8249 C_{1} C_{2} + 21.0192 C_{2}^{2} - 15.8266 C_{1}^{2} C_{2} - 15.8597 C_{2}^{3}) d v$
$200 \times 200$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v$ identify wrongly	N/A
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1.0107 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.497 - 10.756 C_{1} - 12.183 C_{2}$	1.066%
	$+ 10.7587 C_{1}^{2} + 15.9347 C_{1} C_{2} + 21.1536 C_{2}^{2} - 15.9389 C_{1}^{2} C_{2} - 15.966 C_{2}^{3}) d v$
$100 \times 100$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 16.4594 + 43.8416 C_{1} + 20.5547 C_{2} - 43.8509 C_{1}^{2} - 15.3705 C_{2}^{2}) \nabla C_{1} d v$	9.17%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 9.8091 + 19.6113 C_{1} + 14.6763 C_{2} - 29.3139 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 0.9544 \nabla^{2} w_{1} \nabla^{2} C_{1}$
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1.0457 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.46 - 10.641 C_{1} - 12.096 C_{2}$	4.57%
	$+ 10.6537 C_{1}^{2} + 15.7641 C_{1} C_{2} + 21.055 C_{2}^{2} - 15.7832 C_{1}^{2} C_{2} - 15.8958 C_{2}^{3}) d v$
$50 \times 50$	$\int_{Ω} w_{1} \frac{\partial C_{1}}{\partial t} d v = \int_{Ω} \nabla w_{1} \cdot (- 14.898 + 38.6644 C_{1} + 19.7295 C_{2} - 38.678 C_{1}^{2} - \underline{14.984} C_{2}^{2}) \nabla C_{1} d v$	20.7%
	$+ \int_{Ω} \nabla w_{2} \nabla \cdot (- 8.5672 + 17.119 C_{1} + 13.1645 C_{2} - 26.22 C_{1} C_{2}) \nabla C_{2} d v + \int_{Ω} - 0.9213 \nabla^{2} w_{1} \nabla^{2} C_{1}$
	$\int_{Ω} w_{2} \frac{\partial C_{2}}{\partial t} d v = \int_{Ω} - 1.148 \nabla w_{2} \cdot \nabla C_{2} + \int_{Ω} + w_{2} (3.33 - 10.2485 C_{1} - 11.739 C_{2}$	14.7%
	$+ 10.2984 c_{1}^{2} + 15.19 C_{1} C_{2} + 20.5588 C_{2}^{2} - 15.26 C_{1}^{2} C_{2} - 15.5276 C_{2}^{3}) d v$

Equations142

\displaystyle\frac{\partial C}{\partial t}-\mbox{\boldmath$\chi$}\cdot\mbox{\boldmath$\omega$}=0,

\displaystyle\frac{\partial C}{\partial t}-\mbox{\boldmath$\chi$}\cdot\mbox{\boldmath$\omega$}=0,

\displaystyle\mbox{\boldmath$\chi$}=[1,C,C^{2},...,\nabla^{2}C,...],

\displaystyle\mbox{\boldmath$\chi$}=[1,C,C^{2},...,\nabla^{2}C,...],

\frac{\partial C}{\partial t} - D \nabla^{2} C - f = 0

\frac{\partial C}{\partial t} - D \nabla^{2} C - f = 0

\displaystyle\mbox{\boldmath$\chi$}=[1,C,C^{2},\nabla^{2}C]\hskip 20.00003pt\mathrm{and}\hskip 20.00003pt\mbox{\boldmath$\omega$}=[f,0,0,D].

\displaystyle\mbox{\boldmath$\chi$}=[1,C,C^{2},\nabla^{2}C]\hskip 20.00003pt\mathrm{and}\hskip 20.00003pt\mbox{\boldmath$\omega$}=[f,0,0,D].

\displaystyle\int_{\Omega}w\left(\frac{\partial C}{\partial t}-\mbox{\boldmath$\chi$}\cdot\mbox{\boldmath$\omega$}\right)\text{d}v=0.

\displaystyle\int_{\Omega}w\left(\frac{\partial C}{\partial t}-\mbox{\boldmath$\chi$}\cdot\mbox{\boldmath$\omega$}\right)\text{d}v=0.

w^{h} = a = 1 \sum n_{b} d^{a} N^{a}

w^{h} = a = 1 \sum n_{b} d^{a} N^{a}

C^{h} = a = 1 \sum n_{b} c^{a} N^{a},

\displaystyle\sum_{e}\int_{\Omega^{e}}\sum_{a=1}^{n_{\mathrm{b}}}d^{a}N^{a}\left(\frac{\partial C^{h}}{\partial t}-\mbox{\boldmath$\chi$}(C^{h})\cdot\mbox{\boldmath$\omega$}\right)\text{d}v=0.

\displaystyle\sum_{e}\int_{\Omega^{e}}\sum_{a=1}^{n_{\mathrm{b}}}d^{a}N^{a}\left(\frac{\partial C^{h}}{\partial t}-\mbox{\boldmath$\chi$}(C^{h})\cdot\mbox{\boldmath$\omega$}\right)\text{d}v=0.

R_{i} = F_{i} (\frac{\partial C ^{h}}{\partial t}, C^{h}, \nabla C^{h}, ..., N, \nabla N ...),

R_{i} = F_{i} (\frac{\partial C ^{h}}{\partial t}, C^{h}, \nabla C^{h}, ..., N, \nabla N ...),

\int_{Ω} w 1 d v = \int_{Ω} i = 1 \sum n_{b} d^{i} N^{i} a = 1 \sum n_{b} 1 N^{a} d v

\int_{Ω} w 1 d v = \int_{Ω} i = 1 \sum n_{b} d^{i} N^{i} a = 1 \sum n_{b} 1 N^{a} d v

Ξ_{i}^{cons} = \int_{Ω} N^{i} a = 1 \sum n_{b} N^{a} d v .

Ξ_{i}^{cons} = \int_{Ω} N^{i} a = 1 \sum n_{b} N^{a} d v .

Ξ_{i}^{\dot{C}} ∣_{n} = \int_{Ω} N^{i} a = 1 \sum n_{b} \frac{c _{n}^{a} - c _{n - 1}^{a}}{Δ t} N^{a} d v

Ξ_{i}^{\dot{C}} ∣_{n} = \int_{Ω} N^{i} a = 1 \sum n_{b} \frac{c _{n}^{a} - c _{n - 1}^{a}}{Δ t} N^{a} d v

\int_{Ω} w \nabla^{2} C_{n}^{h} d v =

\int_{Ω} w \nabla^{2} C_{n}^{h} d v =

\displaystyle+\int_{\Gamma}\sum_{i=1}^{n_{\mathrm{b}}}d^{i}N^{i}\sum_{a=1}^{n_{\mathrm{b}}}c_{n}^{a}\nabla N^{a}\cdot\mbox{\boldmath$n$}\text{d}s.

Ξ_{i}^{\nabla^{2} C} ∣_{n} = - \int_{Ω} \nabla N^{i} \cdot a = 1 \sum n_{b} c_{n}^{a} \nabla N^{a} d v

Ξ_{i}^{\nabla^{2} C} ∣_{n} = - \int_{Ω} \nabla N^{i} \cdot a = 1 \sum n_{b} c_{n}^{a} \nabla N^{a} d v

Ξ_{i}^{\nabla^{2} C_{BC}} ∣_{n} = \int_{Γ} N^{i} a = 1 \sum n_{b} 1 d s

Ξ_{i}^{\nabla^{2} C_{BC}} ∣_{n} = \int_{Γ} N^{i} a = 1 \sum n_{b} 1 d s

\int_{Ω} w \nabla^{4} C_{n}^{h} d v

\int_{Ω} w \nabla^{4} C_{n}^{h} d v

\displaystyle-\int_{\Gamma}\sum_{i=1}^{n_{\mathrm{b}}}d^{i}\nabla N^{i}\cdot\mbox{\boldmath$n$}\sum_{a=1}^{n_{\mathrm{b}}}c_{n}^{a}\nabla^{2}N^{a}\text{d}s

\displaystyle+\int_{\Gamma}\sum_{i=1}^{n_{\mathrm{b}}}d^{i}N^{i}\sum_{a=1}^{n_{\mathrm{b}}}c_{n}^{a}\nabla(\nabla^{2}N^{a})\cdot\mbox{\boldmath$n$}\text{d}s.

Ξ_{i}^{\nabla^{4} C} ∣_{n} = - \int_{Ω} \nabla^{2} N^{i} a = 1 \sum n_{b} c_{n}^{a} \nabla^{2} N d v

Ξ_{i}^{\nabla^{4} C} ∣_{n} = - \int_{Ω} \nabla^{2} N^{i} a = 1 \sum n_{b} c_{n}^{a} \nabla^{2} N d v

Ξ_{i}^{\nabla^{4} C_{DBC}} ∣_{n}

Ξ_{i}^{\nabla^{4} C_{DBC}} ∣_{n}

Ξ_{i}^{\nabla^{4} C_{NBC}} ∣_{n}

B_{0}^{i} (ξ)

B_{0}^{i} (ξ)

B_{p}^{i} (ξ)

B_{p}^{i} (ξ)

N_{p}^{i} (ξ) = \frac{B _{p}^{i} ( ξ ) w _{i}}{\sum _{i^{'} = 1}^{n_{b}} B _{p}^{i^{'}} ( ξ ) w _{i^{'}}}

N_{p}^{i} (ξ) = \frac{B _{p}^{i} ( ξ ) w _{i}}{\sum _{i^{'} = 1}^{n_{b}} B _{p}^{i^{'}} ( ξ ) w _{i^{'}}}

N_{p}^{ij} (ξ, η)

N_{p}^{ij} (ξ, η)

N_{p}^{ij k} (ξ, η, ζ)

\displaystyle\mbox{\boldmath$y$}=\left[\begin{array}[]{c}\vdots\\ \mbox{\boldmath$\Xi$}^{\dot{C}}|_{n-1}\\ \mbox{\boldmath$\Xi$}^{\dot{C}}|_{n}\\ \mbox{\boldmath$\Xi$}^{\dot{C}}|_{n+1}\\ \vdots\\ \end{array}\right]

\displaystyle\mbox{\boldmath$y$}=\left[\begin{array}[]{c}\vdots\\ \mbox{\boldmath$\Xi$}^{\dot{C}}|_{n-1}\\ \mbox{\boldmath$\Xi$}^{\dot{C}}|_{n}\\ \mbox{\boldmath$\Xi$}^{\dot{C}}|_{n+1}\\ \vdots\\ \end{array}\right]

\displaystyle\mbox{\boldmath$\Xi$}^{\dot{C}}|_{n}=\left[\begin{array}[]{c}\vdots\\ \Xi^{\dot{C}}_{i-1}|_{n}\\ \Xi^{\dot{C}}_{i}|_{n}\\ \Xi^{\dot{C}}_{i+1}|_{n}\\ \vdots\\ \Xi^{\dot{C}}_{N_{\text{DOF}}}|_{n}\end{array}\right]

\displaystyle\mbox{\boldmath$\Xi$}^{\dot{C}}|_{n}=\left[\begin{array}[]{c}\vdots\\ \Xi^{\dot{C}}_{i-1}|_{n}\\ \Xi^{\dot{C}}_{i}|_{n}\\ \Xi^{\dot{C}}_{i+1}|_{n}\\ \vdots\\ \Xi^{\dot{C}}_{N_{\text{DOF}}}|_{n}\end{array}\right]

\displaystyle\mbox{\boldmath$\Xi$}=\left[\begin{array}[]{cccccccc}\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \mbox{\boldmath$\Xi$}^{\text{cons}}|_{n-1}&\mbox{\boldmath$\Xi$}^{C}|_{n}&\mbox{\boldmath$\Xi$}^{{C}^{2}}|_{n-1}&...&\mbox{\boldmath$\Xi$}^{\nabla^{2}C}|_{n-1}&\mbox{\boldmath$\Xi$}^{\nabla^{4}C}|_{n-1}&\Xi^{\nabla^{2}C_{\text{BC}}}|_{n-1}&...\\ \mbox{\boldmath$\Xi$}^{\text{cons}}|_{n}&\mbox{\boldmath$\Xi$}^{C}|_{n}&\mbox{\boldmath$\Xi$}^{{C}^{2}}|_{n}&...&\mbox{\boldmath$\Xi$}^{\nabla^{2}C}|_{n}&\mbox{\boldmath$\Xi$}^{\nabla^{4}C}|_{n}&\Xi^{\nabla^{2}C_{\text{BC}}}|_{n}&...\\ \mbox{\boldmath$\Xi$}^{\text{cons}}|_{n+1}&\mbox{\boldmath$\Xi$}^{C}|_{n+1}&\mbox{\boldmath$\Xi$}^{{C}^{2}}|_{n+1}&...&\mbox{\boldmath$\Xi$}^{\nabla^{2}C}|_{n+1}&\mbox{\boldmath$\Xi$}^{\nabla^{4}C}|_{n+1}&\Xi^{\nabla^{2}C_{\text{BC}}}|_{n+1}&...\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\end{array}\right]

\displaystyle\mbox{\boldmath$\Xi$}=\left[\begin{array}[]{cccccccc}\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ \mbox{\boldmath$\Xi$}^{\text{cons}}|_{n-1}&\mbox{\boldmath$\Xi$}^{C}|_{n}&\mbox{\boldmath$\Xi$}^{{C}^{2}}|_{n-1}&...&\mbox{\boldmath$\Xi$}^{\nabla^{2}C}|_{n-1}&\mbox{\boldmath$\Xi$}^{\nabla^{4}C}|_{n-1}&\Xi^{\nabla^{2}C_{\text{BC}}}|_{n-1}&...\\ \mbox{\boldmath$\Xi$}^{\text{cons}}|_{n}&\mbox{\boldmath$\Xi$}^{C}|_{n}&\mbox{\boldmath$\Xi$}^{{C}^{2}}|_{n}&...&\mbox{\boldmath$\Xi$}^{\nabla^{2}C}|_{n}&\mbox{\boldmath$\Xi$}^{\nabla^{4}C}|_{n}&\Xi^{\nabla^{2}C_{\text{BC}}}|_{n}&...\\ \mbox{\boldmath$\Xi$}^{\text{cons}}|_{n+1}&\mbox{\boldmath$\Xi$}^{C}|_{n+1}&\mbox{\boldmath$\Xi$}^{{C}^{2}}|_{n+1}&...&\mbox{\boldmath$\Xi$}^{\nabla^{2}C}|_{n+1}&\mbox{\boldmath$\Xi$}^{\nabla^{4}C}|_{n+1}&\Xi^{\nabla^{2}C_{\text{BC}}}|_{n+1}&...\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\end{array}\right]

\displaystyle\mathscr{R}=\mbox{\boldmath$y$}-\mbox{\boldmath$\Xi$}\mbox{\boldmath$\omega$}.

\displaystyle\mathscr{R}=\mbox{\boldmath$y$}-\mbox{\boldmath$\Xi$}\mbox{\boldmath$\omega$}.

\displaystyle\mbox{\boldmath$y$}=\mbox{\boldmath$\Xi$}\mbox{\boldmath$\omega$}

\displaystyle\mbox{\boldmath$y$}=\mbox{\boldmath$\Xi$}\mbox{\boldmath$\omega$}

\displaystyle\mbox{\boldmath$\omega$}=(\mbox{\boldmath$\Xi$}^{T}\mbox{\boldmath$\Xi$})^{-1}\mbox{\boldmath$\Xi$}^{T}\mbox{\boldmath$y$},

\displaystyle\mbox{\boldmath$\omega$}=(\mbox{\boldmath$\Xi$}^{T}\mbox{\boldmath$\Xi$})^{-1}\mbox{\boldmath$\Xi$}^{T}\mbox{\boldmath$y$},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Variational system identification of the partial differential equations governing the physics of pattern-formation: Inference under varying fidelity and noise

Z. Wang, X. Huan and K. Garikipati Department of Mechanical Engineering, University of MichiganDepartment of Mechanical Engineering, Michigan Institute for Computational Discovery & Engineering, University of Michigan, University of MichiganDepartments of Mechanical Engineering, and Mathematics, Michigan Institute for Computational Discovery & Engineering, University of Michigan, corresponding author, [email protected]

Abstract

We present a contribution to the field of system identification of partial differential equations (PDEs), with emphasis on discerning between competing mathematical models of pattern-forming physics. The motivation comes from developmental biology, where pattern formation is central to the development of any multicellular organism, and from materials physics, where phase transitions similarly lead to microstructure. In both these fields there is a collection of nonlinear, parabolic PDEs that, over suitable parameter intervals and regimes of physics, can resolve the patterns or microstructures with comparable fidelity. This observation frames the question of which PDE best describes the data at hand. This question is particularly compelling because identification of the closest representation to the true PDE, while constrained by the functional spaces considered relative to the data at hand, immediately delivers insights to the physics underlying the systems. While building on recent work that uses stepwise regression, we present advances that leverage the variational framework and statistical tests. We also address the influences of variable fidelity and noise in the data.

1 Introduction

The widespread use of sensors, high throughput experiments and simulations, as well high performance computing have made “big data” available for a range of engineering physics systems. This has sparked an explosion of interest in data-driven modeling for these systems. An extreme manifestation of this still-developing field is seen in “model-free” approaches that do not rely on physics-based knowledge. Such a path to modelling holds the promise of very high computational efficiency by circumventing solutions to potentially complex physics. However, it draws criticism for its “black box” nature, and often for lack of interpretability. More seriously, these approaches offer scant openings for analysis when the model fails. Conversely, the availability of abundant data also presents opportunities to discover mathematical frameworks, with well-understood physical meaning, which govern the underlying behavior. This has fueled the field of system identification, which is particularly compelling for determining the governing partial differential equations (PDEs). This is so, because knowledge of the PDE directly translates to deep insights to the physics, guided by differential and integral calculus.

Approaches for data-assisted and data-driven discovery of PDEs have proceeded along several fronts. (a) Early work in parameter identification can be traced to nonlinear regression approaches [1, 2]. (b) If the governing equation is unknown, an approximate model—whether physical, empirical, or mixed—could be learned from data. For example, the approximate model could be a neural network [3], a reduced-order model [4, 5], a phenomenological effective model formed by a linear combinations of basis functions in finite dimensions [6, 7], or a linear mapping from a current to a future state where the dynamic characteristics of the mapping could be learned by Dynamic Mode Decomposition [8, 9]. (c) Lastly, the complete underlying governing equations could be extracted from data by combining symbolic regression and genetic programming to infer algebraic expressions along with their coefficients [10, 11]. In this context, while genetic programming balances model accuracy and complexity, it proves very expensive when searching for a few relevant terms from a large library of candidates. In a recent approach to solving inverse problems [12], the strong form of a specified PDE was directly embedded in the loss function while training deep neural network representations of the solution variable. This approach results in the minimum-residual solution subject to a deep neural network global basis. The work contained examples where up to two unknown PDE coefficients were estimated, and did not explore system identification from a larger set of possible operators. A somewhat more ambitious and extensive approach is based on the recognition that most physical systems have only a few relevant terms, thus creating an opportunity to develop sparse regression techniques for system identification. This class of approaches has recently been applied with success [13, 14, 15] to determining the operators in PDEs from a comprehensive library of candidates, thus efficiently discovering governing equations from data. More recent extensions have been applied to model recovery using sparse data after abrupt changes in the system [16], hybrid dynamical systems [17] and multiscale systems [18]. The work presented here is based on this approach, inspired by its direct treatment of operator forms in PDEs, and their connections with physical mechanisms, while presenting several novel advances. These include the adoption of the variational setting, via the weak forms of PDEs; step-wise regression with the statistical F-test; a treatment of noise and its amelioration by varying fidelity.

For context, we briefly discuss the role of pattern forming systems of equations in developmental biology and materials physics. Following Alan Turing’s seminal work on reaction-diffusion systems [19], a robust literature has developed on the application of nonlinear versions of this class of PDEs to model pattern formation in developmental biology [20, 21, 22, 23, 24, 25, 26, 27, 28]. The Cahn-Hilliard phase field equation [29] has been applied to model other biological processes with evolving fronts, such as tumor growth and angiogenesis [30, 31, 32, 33, 34, 35, 36, 37]. Pattern formation during phase transformations in materials physics can happen as the result of instability-induced bifurcations from a uniform composition [38, 39, 40], which was the original setting of the Cahn-Hilliard treatment. Another phase field treatment, the Allen-Cahn equation [41] models nucleation and growth of precipitates and also has seen wide use [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. All these pattern forming systems fall into the class of nonlinear, parabolic PDEs, and have spawned vast literature in mathematical physics. They can be written as systems of first-order dynamics driven by a number of time-independent terms of algebraic and differential form. The spatio-temporal, differentio-algebraic operators act on either a composition (normalized concentration) or an order parameter. It also is common for the algebraic and differential terms to be coupled across multiple species.

It is compelling to attempt to discover the physics governing these pattern forming systems by identifying their PDE forms from data. However proper evaluation of the numerical derivatives is very challenging. The commonly used finite-difference approximations require data to be available in a very finely discretized space. Polynomial interpolation has proven to be reliable and robust [53], and recently has found use in data-driven discovery of PDEs [15]. However, this approach encounters substantial difficulties in estimating the coefficients of high-order derivatives, because of the significant error associated with numerical differentiation. Even still, to the best of our knowledge, the published literature in discovering governing equations has remained focused on trying to identify the strong form of the PDEs. Every strong form, however, has equivalent weak form(s), which seem to have not been exploited for system identification. As will be seen in this communication, the advantages of using weak forms include: the natural occurrence of the loss function for the regression problem, allowing identification of the boundary conditions, which has proven to be a challenge, and successfully identifying higher-order derivatives with the aid of smooth basis functions (such as non-uniform rational B-splines or NURBS in this work).

Here, after briefly reviewing how the Galerkin weak form may be employed for identification of a general dynamical system (Section 1.1), we focus on introducing our methods for identifying PDEs in this variational setting, using stepwise regression (Section 2). In this first communication, we focus on examples to identify reaction-diffusion equations evolving under Schnakenberg kinetics [54], and the Cahn-Hilliard and Allen-Cahn equations driven by non-convex energy density functions (Section 3). This choice is driven by the fact that, as explained above, these are widely used models in biological patterning and morphogenesis, and have well-established traditions of use in materials physics, also. The datasets we use are all obtained by high fidelity direct numerical simulations. The more challenging problem of datasets from physical experiments will be addressed in a subsequent communication. We do call attention to our treatment of both high and low fidelity data, with and without noise, as a precursor to using data from physical experiments. Concluding remarks appear under Discussion and Conclusions (Section 4).

We note in passing that, very recently, there has been growing interest in combining data-driven methods with PDEs in a more direct manner: to find solutions to the canonical initial and boundary value problems (IBVPs) of mathematical physics. These approaches include Gaussian processes with kernels defined by specific PDEs in linearized form for one- and two-dimensions [55], and deep neural networks for solution of high-dimensional IBVPs on semi-linear parabolic PDEs [56] or with strong forms of the target one- and two-dimensional PDEs embedded in the loss functions [12].

1.1 The Galerkin weak form

We first provide a brief discussion of how the weak form of PDEs will be used in this work. We start with the general strong form for first-order dynamics written as

[TABLE]

where $\chi$ is a vector containing all possible independent terms expressed as algebraic and differential operators on the scalar solution $C$ :

[TABLE]

and $\omega$ is the vector of pre-factors for each term. Thus, the one-field diffusion reaction equation

[TABLE]

with constant diffusivity $D$ and reaction rate $f$ has

[TABLE]

Note that the time derivative $\partial C/\partial t$ is treated separately from the other terms in order to highlight the first-order dynamics of all problems we target in this work.

Next, we recall the weak form that is equivalent to a general strong form written as above with vector of operators $\boldsymbol{\chi}$ and vector of pre-factors $\boldsymbol{\omega}$ . For infinite-dimensional problems with Dirichlet boundary conditions on $\Gamma^{c}$ the weak form can be stated as: $\forall~{}w\in\mathscr{V}$ where $\mathscr{V}=\{w|~{}w=~{}0\;\mathrm{on}\;\Gamma^{c}\}$ , find $c\in\mathscr{S}$ where $\mathscr{S}=\{C|C=\bar{C}\text{ on }\Gamma^{c}\}$ such that

[TABLE]

For finite-dimensional fields $C^{h},w^{h}$ , the weak form is as follows: find $C^{h}\in\mathscr{S}^{h}\subset\mathscr{S}$ where $\mathscr{S}^{h}=\{C^{h}\in\mathscr{H}^{2}(\Omega)~{}|~{}C^{h}=~{}\bar{u}\;\mathrm{on}\;\Gamma^{u}\}$ , such that $\forall~{}w^{h}\in\mathscr{V}^{h}\subset\mathscr{V}$ where $\mathscr{V}^{h}=\{w^{h}\in\mathscr{H}^{2}(\Omega)~{}|~{}w^{h}=~{}0\;\mathrm{on}\;\Gamma^{u}\}$ , the finite-dimensional (Galerkin) weak form of the problem is satisfied. The choice of $\mathscr{H}^{2}(\Omega)$ as the Sobolev space is motivated by the differential operators, which reach the highest order of two in the weak forms we consider (four in strong form). The variations $w^{h}$ and trial solutions $C^{h}$ are defined component-wise using a finite number of basis functions,

[TABLE]

where $n_{\mathrm{b}}$ is the dimensionality of the function spaces $\mathscr{S}^{h}$ and $\mathscr{V}^{h}$ , and $N^{a}$ represents the basis functions. Substituting $w^{h}$ and $C^{h}$ in Equation (5) by Equations (6) and (7) and decomposing the integration over $\Omega$ by a sum over subdomains $\Omega^{e}$ leads to:

[TABLE]

Upon integration by parts, application of appropriate boundary conditions, and accounting for the arbitrariness of $w^{h}$ in $\mathscr{V}^{h}$ , the finite-dimensionality leads to a system of residual equations for each degree of freedom (DOF):

[TABLE]

where $\mathscr{R}_{i}[C^{h}]$ is the $i^{\text{th}}$ component of the residual vector, a functional of $C^{h}$ satisfying $\mathscr{R}_{i}[C^{h}]=0$ for the Galerkin solution of the problem, $C^{h}$ .

2 Identification of governing parabolic PDEs in weak form

The PDE identification problem in strong form is to find the correct pre-factor $\omega$ in Equation (1), which chooses among many possible candidate basis operators acting on $C$ collected in $\chi$ , given the solution/data of the dynamical system. Alternately, we could identify the PDE in finite-dimensional weak form by adopting finite-dimensional basis functions and also finding $\omega$ for many possible basis operators, but now written in weak form, acting on $C^{h}$ . In preparation, we first recall several basis operators, including Neumman boundary condition operators, all written in weak form. We next introduce NURBS basis functions, which have been popularized for isogeometric analysis [57]. The algorithm of stepwise regression for model selection is introduced following these developments.

2.1 Candidate basis operators in weak form

Suppose we have data for the field $C$ on certain degrees of freedom (DOFs) of a discretization for a series of time steps. These DOFs are the control variables if NURBS basis functions are used. Selecting all or a subset of these DOFs, a Galerkin representation of the field could be constructed over the domain. The use of fewer DOFs represents lower fidelity data, which is discussed separately in Section 2.4. We can then obtain the system of residual equations in (9), with the independent candidate basis terms in weak form populating $\chi$ . Below we illustrate the procedure with four example terms.

Constant term $1$ :

[TABLE]

Since the $d^{i}$ are arbitrary, we obtain a vector of size $N_{\text{DOF}}$ , where the $i$ th component is

[TABLE]

Time derivative $\partial C/\partial t$ at time $t_{n}$ : We write a backward difference approximation of the time derivative at $t_{n}$ as

[TABLE]

where $\Delta t=t_{n}-t_{n-1}$ is the time step, and $c_{n}$ and $c_{n-1}$ are the control variable values at times $t_{n}$ and $t_{n-1}$ .

Laplace operator $\nabla^{2}C$ at time $t_{n}$ : Multiplying by the weighting function and integrating by parts:

[TABLE]

Note that the second term on the right is the Neumann boundary term. For a flux boundary condition on $\Gamma$ , such that $-\nabla C^{h}_{n}\cdot\mbox{\boldmath$ n $}=1$ . We thus have two basis functions after accounting for the arbitrariness of $d^{i}$ :

[TABLE]

is the basis for $\nabla^{2}C^{h}_{n}$ (with zero Neumann boundary condition), and

[TABLE]

is the basis for the Neumann boundary term arising from $\nabla^{2}C$ at the surface $\Gamma$ . As is well known, in the variational setting the weak form of the divergence operator yields volume (15) and surface (16) terms, with the latter corresponding to the Neumann boundary condition. In this work we treat them as distinct bases, thus allowing us to separately identify the Neumann boundary term.

We also call attention to the fact that for the unsteady diffusion problem, equating (12) to the sum of (15) and (16) amounts to the Backward Euler time integration algorithm, which is first-order accurate. The mid-point rule, on the other hand is second-order in time.

Biharmonic operator $\nabla^{4}C_{n}$ at time $t_{n}$ : Multiplying by the weighting function and integrating by parts:

[TABLE]

Since the $d^{i}$ are arbitrary, we define the first term

[TABLE]

as the basis for the biharmonic operator $\nabla^{4}C$ (with zero Neumann boundary condition). The last two terms in (17) are, respectively, a higher-order Dirichlet boundary condition (as emerges from variational calculus applied in the context of the full equation [39]) and the Neumman boundary condition, from which we define:

[TABLE]

The second-order gradients in the Equation (18) require the solutions and basis functions to lie in $\mathscr{H}^{2}(\Omega)$ , while the Lagrange polynomial basis functions traditionally used in finite element analysis only lie in $\mathscr{H}^{1}(\Omega)$ . We therefore draw the basis functions, $N$ , from the family of Non-Uniform Rational B-Splines (NURBS), and adopt Isogeomeric Analysis (IGA) in our simulations to find the solutions in $\mathscr{H}^{2}(\Omega)$ . A discussion of the NURBS basis and IGA is beyond the scope of this paper; interested readers are directed to the original works on this topic [57]. We briefly present the construction of the NURBS basis functions in the following subsection.

2.2 NURBS basis functions

Similar to Lagrange polynomial basis functions traditionally used in the Finite Element Method (FEM), NURBS basis functions are partitions of unity with compact support, satisfy affine covariance (i.e., an affine transformation of the basis is obtained by the affine transformation of its nodes/control points), and support an isoparametric formulation, thereby making them suitable for a Galerkin framework. They enjoy advantages over Lagrange polynomial basis functions in being able to ensure $C^{n}$ -continuity, possessing the positive basis and convex hull properties, and being variation diminishing.

The building blocks of the NURBS basis functions are univariate B-spline functions defined as follows. Consider two positive integers $p$ and $n$ , and a non-decreasing sequence of values $\chi=[\xi_{1},\xi_{2},....,\xi_{n+p+1}]$ , where $p$ is the polynomial order, $n$ is the number of basis functions, $\xi_{i}$ are coordinates in the parametric space referred to as knots (equivalent to nodes in FEM), and $\chi$ is the knot vector. The B-spline basis functions $B_{i,p}(\xi)$ are defined starting with the zeroth order basis

[TABLE]

and higher orders using the Cox-de Boor recursive formula for $p\geq 1$ [58]

[TABLE]

The knot vector divides the parametric space into intervals referred to as knot spans (equivalent to elements in FEM). A B-spline basis function is $C^{\infty}$ -continuous inside knot spans and $C^{p-1}$ -continuous at the knots. If an interior knot value repeats, it is referred to as a multiple knot. At a knot of multiplicity $k$ , the continuity is $C^{p-k}$ . Using a quadratic B-spline basis (Figure (1)), a $C^{1}$ -continuous one dimensional NURBS basis can now be constructed:

[TABLE]

where $w_{i}$ are the weights associated with each of the B-spline functions. In higher dimensions, NURBS basis functions are constructed as a tensor product of their one-dimensional counterparts:

[TABLE]

Using NURBS basis functions, we are able to smoothly evaluate the higher order derivatives of the data by interpolating them from the DOF control variable values. This is crucial in enabling the identification of higher order derivatives (e.g., the second order derivatives that arise in the weak form of the Cahn-Hilliard equations) using the stepwise regression methods described next.

2.3 Identification of basis operators via stepwise regression

To identify a dynamical system, we need to generate all possible basis operators acting on the solution, compute the non-zero pre-factor for each basis operator that is in the model (relevant bases), while also attaining pre-factors of zero for the the basis operators that are not in the model (irrelevant bases). For situations where the true model is not contained within the set of candidate basis operators, our method then finds the optimal model (optimal in the sense of the loss function to be defined shortly) out of the set of all possible models. The resolution of such situations requires investigations into model inadequacy/misspecification (e.g., discussed in [59]), which is a challenging problem that we do not attempt to tackle here; instead, we limit our scope to work within a given set of candidate operators. We begin by putting together the time derivative basis operators at each time $\{\dots,t_{n-1},t_{n},t_{n+1},\dots\}$ . Let

[TABLE]

be our target vector. Note that each element $\mbox{\boldmath$ \Xi $}^{\dot{C}}|_{n}\in\mathbb{R}^{N_{DOF}}$ in (33) is a vector formed in Equation (12):

[TABLE]

Likewise we can form the matrix, $\Xi$ , containing all possible terms:

[TABLE]

The residual accounting for all DOFs and time steps can then be expressed as

[TABLE]

The solution $\mathscr{R}=0$ gives

[TABLE]

Note that the target vector in (40) and matrix $\Xi$ in (46) are currently structured to contain all DOFs. However we can always exclude some DOFs by deleting their corresponding rows. Additional attention is needed if one wishes to eliminate an entire basis operator. For instance, doing so to the Neumann boundary basis requires deletion of the corresponding column. The reduced Equation (48) would then represent data with the homogeneous Neumann boundary condition.

If Equation (48) were directly solved for $\boldsymbol{\omega}$ as an ordinary least squares regression problem, the analytic solution is

[TABLE]

where a nontrivial $y$ leads to a non-trivial solution to $\omega$ . The least squares formulation can also be interpreted as an optimization problem with the loss function

[TABLE]

which is simply the squared Euclidean norm of the residual vector obtained from the weak form. The system identification problem can then be viewed as finding $\omega$ that minimizes the loss:

[TABLE]

If the linear system in Equation (48) is free of error, the loss function $l$ and the prefactors for irrelevant terms can all be driven down to machine zero. For a linear system containing error terms (further discussed in Section 2.4), however, performing a standard regression will result in a solution of $\omega$ with nonzero contributions in all component of this vector. Such a solution of the system identification will not sharply delineate the relevant bases. Additionally, we could apply a separate least squares regression for each possible combination of all bases and choose the best one [60].aaaA technique called best subset selection. However the problem of selecting the best model from among the $2^{m}$ possibilities grows exponentially in $m$ , the number of all candidate basis. A viable alternative is to select the relevant bases by shrinking the pre-factors towards zero for the irrelevant bases. Ridge regression and $\ell_{1}$ regression (also known as least absolute shrinkage and selection operator (LASSO), a popular problem in compressive sensing research) are two well-known techniques to induce regularization. The optimization problem Equation (51) is updated with additional penalty terms:

[TABLE]

where $\lambda_{2}$ and $\lambda_{1}$ are regularization parameters. For both methods, however, selecting a good regularization parameter is critical, and may need to leverage prior knowledge about the magnitude of pre-factors[60].

2.3.1 Stepwise regression

In this work, we use backward model selection by stepwise regression [60]. The algorithm is summarized below.

Algorithm for Model selection by Stepwise regression:

Step 0: Establish target vector $y$ and matrix of bases, $\Xi$ .

Step 1: Solve $\mbox{\boldmath$ \omega $}^{i}$ in the linear regression problem, Equation (48) using ordinary least squares regression. Calculate the loss function at this iteration, $l^{i}$ .

Step 2: Eliminate basis terms in matrix $\Xi$ by deleting their columns, using the criterion to be introduced below. Set to zero the corrpesonding components of $\mbox{\boldmath$ \omega $}^{i}$ . GOTO Step 1. Note that in this case the value of loss function remains small ( $l^{i}\sim l^{i-1}$ ), and the solution may be overfitted.

Step 3: The algorithm stops if the pre-specified criterion does not allow us to eliminate any more basis terms. Beyond this, the loss function increases dramatically for any further reduction.

There are several choices for the criterion for eliminating basis terms. Kutz and co-workers applied a hard threshold on the pre-factors in each iteration [13, 14, 15]. Since the results of sparse regression and sparsity profiles of $\omega$ vary with different values of the threshold, a different method is needed for the final solution corresponding to the optimal tolerance. In the work mentioned above, Kutz and co-workers used Pareto front analysis by cross-validation to find the optimal tolerance. The drawback of this approach, in our experience, is that its performance degrades if the pre-factors of the relevant basis operators differ significantly in magnitude. For the approach to perform well, a carefully chosen rescaling is needed for the basis operators along with cross-validation.

Here, we explore, as an alternative, a widely used statistical criterion called the $F$ -test [60], which can be motivated as follows: Since the model at iteration $i$ contains fewer bases than at iteration $i-1$ , and is therefore more restrictive, the loss function, in general, is higher at iteration $i$ than at iteration $i-1$ . In this sense, model always becomes “worse” after eliminating any basis. We want to determine whether the model at iteration $i$ is significantly worse than at iteration $i-1$ . If not, the basis could be eliminated. We can evaluate the significance of the change by the $F$ -test:

[TABLE]

where $p^{i}$ is the number of bases at iteration $i$ and $n$ is the total number of bases. The $F$ value essentially evaluates the increasing loss function under the consideration of model complexity. If $F$ does not exceed a pre-defined tolerance, $\alpha$ , meaning that the model at iteration $i$ is not significantly worse than at iteration $i-1$ , we eliminate the considered basis. Using the $F$ -test in Step 2 of the above algorithm, this step can be re-stated as following:

Step 2 with $F$ -test:

Step 2.1:

Tentatively eliminate the basis corresponding to pre-factors in $\mbox{\boldmath$ \omega $}^{i}$ which are smaller than the pre-defined threshold, evaluate the $F$ value followed by ordinary least squares regression on the reduced bases set.

Step 2.2:

IF $F<\alpha$

THEN formally eliminate these bases in matrix $\Xi$ , by deleting the corresponding columns. GOTO Step 1.

ELSE GOTO Step 2.1, and choose another basis.

In this work we defined the threshold to be:

[TABLE]

where $\omega^{i}_{\text{smallest}}$ is the smallest value in $\mbox{\boldmath$ \omega $}^{i}$ , and $\epsilon$ is a small tolerance, and chosen to be $10^{-5}$ . This threshold will eliminate the basis with the smallest pre-factor, as well as all bases with pre-factors within $\epsilon$ of the smallest one. Note that $\alpha$ in the $F$ -test is the only hyperparameter in the algorithm. System identification is significantly more challenging if greater numbers of candidate bases are retained in the first few iterations. A small value of $\alpha$ may not eliminate all irrelevant bases at once, but can serve as a good initial value as it will filter some irrelevant bases, thus mollifying the problem to a degree. In this work, we choose $\alpha=1$ initially, and increase it to 10 when no more basis can be eliminated for $\alpha=1$ .bbbFor a different problem, the value can be rigorously chosen by cross-validation.. We have summarized the step-wise algorithm in Figure 2.

As mentioned previously, our approach builds on recent work in sparse regression [13, 14, 15]. However, it differs prominently from those papers in adopting a variational setting, using the $F$ -test for determining bases, and, as will be seen in Section 2.4, in considering the influence of data fidelity and noise.

More broadly, the purpose of the $F$ -test is to perform model selection: a criterion to judge whether the new candidate model with fewer operators is more suitable than the previous one. At the core, the comparison is made to balance the desires of (a) fitting the data and (b) reducing the model complexity. Indeed, model selection is a rich and active area of research, encompassing methods such as cross-validation [61, 62], Akaike information criterion (AIC) [63], Bayesian information criterion (BIC) [64], and Bayes factor [65, 66]. We will explore additional model selection techniques in the future, especially those that require a Bayesian version of our system identification framework.

2.4 Low fidelity and noisy data

In practice, we may not be able to collect high fidelity data in space. As illustrated in Figure 3, we consider the situation in which data is gathered at fewer locations in space, or equivalently from fewer instances in time. The low fidelity data yields larger element sizes when constructing the finite dimensional spatial bases, and larger time steps, for use in the weak form.

Furthermore, the data could be contaminated by background noise as well as data collection error, such that the observed data $\widehat{C}$ may be noisy. We model these imperfections using an additive form

[TABLE]

Consequently the target vector $\widehat{\mbox{\boldmath$ y $}}$ and every basis in $\widehat{\mbox{\boldmath$ \Xi $}}$ constructed from $\widehat{C}^{h}$ will contain error terms. For $\widehat{\mbox{\boldmath$ y $}}$ , we have

[TABLE]

We further assume that $\epsilon$ , the noise on $C^{h}$ , follows an independent and identically distributed (i.i.d.) Gaussian distributioncccWhile the observation noise may have systematic bias and correlation, we do not explore these possibilities in this paper, and instead take an initial step to assume an i.i.d. setting. with zero mean and standard deviation $\sigma$ :

[TABLE]

Since linear combinations of mutually independent Gaussian random variables are also Gaussian, we have:

[TABLE]

where $\epsilon_{\dot{C}}$ is a Gaussian distribution with zero mean and standard deviation $\sqrt{2}\sigma$ . Since the integral is computed using quadrature, we have:

[TABLE]

where $|J|$ is the determinant of the Jacobin matrix of the element mapping from an isoparametric domain, and $w_{q}$ is the weight for the quadrature point $q$ . The first term on the right hand side reflects the noiseless (true) component of $\Xi^{\dot{\widehat{C}}}_{i}|_{t}$ , and the second term is the error. the true term converges to the time derivative:

[TABLE]

however the error term scaled by the time step diverges as $\Delta t\longrightarrow 0$ . It also indicates that the error due to noise decreases if larger time steps are used:

[TABLE]

For gradient operators, such as the Laplacian and the Biharmonic operator, we have

[TABLE]

from which, since the forward solution converges with decreasing mesh size, $h$ , we have:

[TABLE]

However the error terms are scaled by $h^{-2}$ in the noisy Laplacian operator and $h^{-4}$ in the noisy Biharmonic operator, where the mesh size $h$ is introduced by the gradients of the basis functions. The error on $\widehat{C}^{h}$ is amplified by the mesh size acting through the spatial gradient operators and diverges as $h\longrightarrow 0$ . Consequently, we have

[TABLE]

The above analysis illustrate that the final error induced by noise on $C^{h}$ will decrease with larger time steps and element sizes. In Section 3 we show that for noisy $C^{h}$ , using low fidelity data improves the results of system identification. For purely algebraic bases, i.e., without differential operators from time derivatives or spatial gradients, time step and element size do not affect the ratio of error to true value. Note also that the approximations in Equations (61), (65) and (66) degrades with increasing time step and element size, bringing in discretization errors.

3 Examples

We now turn to using the framework detailed in the preceding section to identify the parabolic PDEs that govern patterning. Instead of directly applying our algorithms to data from physical experiments, in this first communication, we test our methods on identifying PDEs from data obtained through high-fidelity, direct numerical simulations (DNS).

We consider test cases with the following four data-generating models, with their true coefficients summarized in Table 1:

Model 1:

[TABLE]

where $\Gamma$ is the domain boundary.

Model 2:

[TABLE]

where $\Gamma_{2}$ and $\Gamma_{4}$ are shown in Figure 5. Models 1 and 2 represent coupled diffusion-reaction equations for two species following Schnakenberg kinetics [54], but with different boundary conditions. For an activator-inhibitor species pair, these equations use auto-inhibition with cross-activation of a short range species, and auto-activation with cross-inhibition of a long range species to form so-called Turing patterns [19].

Model 3:

[TABLE]

where $g$ is a non-convex, “homogeneous” free energy density function, whose form has been chosen from [28]:

[TABLE]

Model 4:

[TABLE]

Model 3 is a two field Cahn-Hilliard system with the well-known fourth-order term in the concentration, $C_{1}$ and $C_{2}$ . The three-well non-convex free energy density function (See Figure 4), $g(C_{1},C_{2})$ , drives segregation of the system into two distinct types. We have previously used this system to make connections with cell segregation in developmental biology [28].

The diffusion-reaction and Cahn-Hilliard equations are widely used in biological pattern generation, as discussed in the Introduction. The Cahn-Hilliad equation [29] also occupies a central role in the materials physics literature for modelling phase transformations developing from a uniform concentration field in the presence of an instability.

Model 4 is the coupled Cahn-Hilliard/Allen-Cahn equations system [41] where the field $C_{2}$ describes growth of alloy precipitates from nuclei created by a process of spinodal decomposition that controls the field $C_{1}$ . The free energy density $g(C_{1},C_{2})$ couples these processes through $\mu_{1}$ and $\mu_{2}$ in both Models 3 and 4. The Introduction also cites some of the relevant literature on the use of Cahn-Hilliard and Allen-Cahn equations in materials physics. An important difference between Models 3 and 4 is that $C_{2}$ in Model 4 is a non-conserved order parameter, which defines the identities of the precipitate, described by composition field $C_{1}$ , and matrix phases. Both Cahn-Hilliard equations in Model 3 are conservative. Note that solving the Cahn-Hilliard and Allen-Cahn equations in the isogeometric analytic framework requires imposition of the higher-order Dirichlet boundary conditions (79) and (86). We use Nitsche’s method [67] to impose these boundary conditions weakly.

Substituting the parameter values from Table 1, we present the weak form of each model:

Model 1:

[TABLE]

Model 2:

[TABLE]

Model 3:

[TABLE]

Model 4:

[TABLE]

The higher order boundary conditions and stabilization terms in Nitsche’s methods are not shown here. We do not infer these terms since they are of a numerical rather than physical nature.

3.1 Data preparation

All computations have been implemented in the mechanoChemIGA framework, a library for modeling mechano-chemical problems using isogeometric analytics, available at https://github.com/mechanoChem/mechanoChem. The IBVPs presented here are two-dimensional. Data generation by DNS was carried out on uniformly discretized $400\times 400$ meshes. We start all Models from the same initial conditions obtained by running Model 1 for a very short time from a randomized state with small perturbations about $C_{1},C_{2}=0.5$ (see Figure 5). This state was relabelled as $t=0$ .

With this as initial conditions, the early stage dynamics of Models 1-4 cannot be distinguished by eye (see Figure 6).

However driven by different boundary condition and/or governing equations, the dynamics of the four models soon evolve along distinct trajectories, and form completely different patterns over a longer time as shown in Figure 7.

The $400\times 400$ mesh yields the DNS data for each model, in terms of the fields $(C_{1},C_{2})$ , at the corresponding $401\times 401$ DOFs for each time step. The synthetic, high fidelity DNS data are regularized by having been obtained from PDEs with derivative constraints. Therefore, they are noise-free, which contributes to finding solutions $\omega$ that drive the residual equations (9) down to machine zero. However the data collected from physical experiments will not satisfy PDEs to the same precision, in general: The data could be at lower fidelity, having been collected at fewer spatial positions and time instants, and therefore may not approximate derivatives well. Background noise and collection error will further contribute to a poorer approximation by PDEs, as discussed in Section 2.4. To mimic the experimental data, we take two approaches: i) As illustrated in Figure 3, we use nested meshes for coarse-graining the data $(C_{1},C_{2})$ after generating them by DNS on the $400\times 400$ mesh. Doing so introduces a loss of information on derivatives. ii) We superpose noise with a Gaussian distribution having zero mean and standard deviation $\sigma$ , on the collected $(C_{1},C_{2})$ . We first collect the DNS data, from all four models, on DOFs corresponding to the $400\times 400$ mesh, which was used for the forward computations. Data from the same forward computations were then collected on the DOFs corresponding only to the nested $200\times 200$ , $100\times 100$ and $50\times 50$ meshes. This yielded datasets with four different fidelities for each model, and 16 different datasets in total. We then superimpose noise with $\sigma=10^{-4}$ for data generated from Models 1 and 2, and $\sigma=10^{-5}$ for data generated from Models 3 and 4, yielding another 16 data sets. Having the clean and noisy data at different fidelity, we generate 38 candidate bases in addition to the time derivative terms, summarized in Table 2. Note that to construct the time derivative terms, we need data at two time steps, $t_{n-1}$ and $t_{n}$ . Data at $t_{n-1}$ is only used in the time derivative terms, while other bases are constructed using data at $t_{n}$ (Backward Euler integration).

3.2 System identification with data at different fidelity without noise

In order challenge our algorithms to distinguish the underlying PDEs from the simulation results, we first use only DNS results over certain time intervals that, to the eye, appear identical to each other. For this purpose, we first apply stepwise regression on the 16 data sets (four datasets of different fidelity from four models) without noise, generated at $t=0$ (Figure 5) and $t=0.01$ (Figure 6). For the four high fidelity data ( $400\times 400$ mesh), we found that in the first iteration, all the relevant bases have almost converged to the correct pre-factor, and irrelevant bases have trivial (approaching zero) pre-factors. By setting the tolerance $\epsilon=1.0\times 10^{-5}$ in the threshold equation (55), all relevant bases are selected with exact pre-factors in the second iteration. The results are not shown here since the coefficients on operators are identical to those in Equations (90 - 95) for the four datasets respectively. In particular, our system identification algorithms correctly identify the zero flux boundary condition in Model 1, constant flux boundary condition on $\Gamma_{4}$ in Model 2, and higher order terms of the form $\nabla^{2}(\ast\nabla^{2}C)$ (see Table 2), in Models 3 and 4. The algorithm terminates after the second iteration since no more bases can be eliminated by the $F$ test. As shown in Figure 8, for the first two iterations, the loss function remains machine zero as the synthetic data drive the residual equations (9) down to this limit. If any more bases are eliminated, the loss function increases dramatically in Iteration 3 due to underfitting.

Initially $C_{1}$ and $C_{2}$ are close to uniformly distributed with small perturbations. This uniformity yields bases, such as $\int_{\Omega}\nabla w\nabla C_{1}\text{d}v$ and $\int_{\Omega}\nabla wC_{1}\nabla C_{1}\text{d}v$ that appear to be linearly dependent (left plot in Figure in 9). The high fidelity data ( $400\times 400$ mesh) drive the residual equations (9) down to machine zero, allowing the linear regression model Equation (48) to converge. The algorithms can therefore select the relevant bases, distinguishing from other irrelevant bases that are close to being linearly dependent. However, low fidelidy data using NURBS basis functions yields poor representation of derivatives such as those in Equations (61), (65) and (66). As a result, the linear regression model contains greater error using low fidelity data, which are created by directly sparsifying the high-fidelity data instead of being generated from a coarse grid. Therefore, an unresolvable discrepancy remains in the low-fidelity representation, and two terms that are nearly linearly dependent cannot be distinguished. Using low fidelity data at $t=0$ and $t=0.01$ , the algorithms are unable to identify any of the four governing PDEs (Models 1-4).

This spurious linearity however disappears as the system evolves (right plot in Figure 9).

By choosing data at 10 time steps: from $t=1$ with time step $\Delta t=1$ , the algorithms are able to correctly identify all PDEs (Models 1-4) using low fidelity data. To quantify the algorithm’s performance, we define the error of the estimated results to be:

[TABLE]

where $\widehat{\omega}$ is the estimated pre-factor, and $\omega$ is the true pre-factor. Figures 10 - 13 show the scaled results at the final iteration. The exact values and the error are summarized in Tables 3 - 6.

Note that the Neumman boundary condition terms only have contributions from DOFs on the boundary, which are sparse relative to the total number. Consequently the basis for the Neumman boundary condition has a negligible contribution to the linear regression model compared to operators that appear via volume integrals in the weak forms of the PDEs. This results in large errors of the pre-factor for the Neumann terms. As shown in the left panel of Figure 11, for the results using data generated from Model 2, the Neumman boundary condition operator is more poorly estimated than others. Using very low fidelity data ( $50\times 50$ mesh), the Neumman boundary condition basis fails to be be identified. However, the errors in the other bases are similar to those obtained when using data generated from Model 1, for which the Neumann boundary condition and the corresponding operator are zero.

The estimated results for data generated from Models 3 and 4, match the true values very well except for the lowest fidelity dataset using the $50\times 50$ mesh (shown in Figures 12 and 13). This lowest fidelity data set poses even greater challenges, because, to identify the governing PDE for $C_{1}$ , using the $50\times 50$ mesh, we need to place a bias on the basis $\int_{\Omega}\nabla wC_{2}^{2}\nabla C_{1}\text{d}v$ to protect it from elimination in the first few iterations. Although we are able to select all relevant terms with this prior, the error increases significantly as the mesh cannot properly resolve the small variation of the basis as we see below in Figure 14.

Of the four datasets of different fidelity from each model, the high fidelity dataset ( $400\times 400$ mesh) yields exact pre-factors for all bases. However for the lower fidelity datasets, variations in the data field over a small neighborhood of the domain may diminish as the decreasing mesh size resolves the variation poorly, i.e. the approximation in Equations (65) and (66) becomes poor. The variations may even completely disappear, as seen on the $50\times 50$ mesh (see one example shown in Figure 14).

Therefore, all the results deteriorate when using lower fidelity data. Specifically, for data generated from Model 1, the error gradually increases for decreasing mesh size. Due to the small pre-factor for the constant term, the error in the governing equation of $C_{1}$ is bigger than that in the governing equation of $C_{2}$ .

3.3 System identification under noisy data

Using meshes of different fidelity, we superimpose noise with $\sigma=10^{-4}$ for data generated from Models 1 and 2, and $\sigma=10^{-5}$ for data generated from Models 3 and 4. The noise on $C$ will be amplified in the time derivative and spatial gradients as discussed in Section 2.4. Figure 15 shows the target vector, $\Xi^{\dot{\widehat{C}}}$ with two different time steps, $\Delta t=0.001$ and $\Delta t=1$ , on different meshes. At the smaller time step, $\Delta t=0.001$ , the noise, scaled by $\Delta t$ , washes out the true value. Decreasing the mesh size cannot alleviate the noise, as the ratio of error, denoted by $\sigma$ , to its true value largely remains fixed. At the larger time step, $\Delta t=1$ , the noise is suppressed, as expressed in Equation (62). For the following analysis, We choose data at the same 10 time steps starting from $t=10$ , with time step $\Delta t=1$ .

Figure 16 shows the Laplacian basis operator on $C_{2}$ , i.e., $\Xi^{\nabla^{2}\widehat{C_{2}}}$ , with data generated by Model 1 using different meshes.

For the fine mesh, the noise remains at a high level over an element, and washes out the true value. By decreasing the mesh size, the ratio of final error to true value decreases by a factor of $h^{-2}$ as shown in Equation (67), and the true value begins to emerge in the basis generated from noisy data. Consequently, as shown in Figure 17, using high fidelity data (the $400\times 400$ mesh) we are unable to correctly identify the governing equation for $C_{2}$ using data generated by Model 1. Using lower fidelity data, however, we are able to identify all the relevant terms. The error also is smaller when using lower fidelity data. For the same reason, the governing equation for $C_{2}$ using data generated by Model 2 also cannot be identified using the $400\times 400$ and $200\times 200$ meshes, but becomes feasible for lower fidelity data (see Figure 18). On the other hand the error is small for the basis operator $C_{1}$ in both models. As a result, the estimation of the governing PDE for $C_{1}$ is comparable with that by using same data of the same fidelity, but without noise. Again, as seen in the previous section, the Neumman boundary condition cannot able be identified by using very low fidelity data on the $50\times 50$ mesh.

Figure 19 shows the biharmonic operator, $\Xi^{\nabla^{4}\widehat{C_{1}}}$ using data generated by Model 3 with different meshes.dddThis field was computed by projecting the data on to a fourth-order $C^{3}$ NURBS basis. Similar to the Laplace operator, the true value of the basis is washed out as the noise overwhelms the high fidelity data on the fine meshes. By using low fidelity data, we are able to correctly identify the governing equations for $C_{1}$ and $C_{2}$ using data generated from Model 3 as shown in Figure 20. The results for data generated from Model 4 are mixed. The governing equation for $C_{1}$ is similar to that in Model 3 with the fourth-order biharmonic operator in the concentration. Similar to that case, and for the same reason of noise overwhelming the true data, Model 4 cannot be identified by using high fidelity data, but can be identified by using low fidelity data (see Figure 21). However, the governing equation for $C_{2}$ does not have the fourth-order biharmonic operator. Given the small error on $(C_{1},C_{2})$ , the identification of the governing PDE for $C_{2}$ is comparable with that by using same fidelity data, but without noise. Again, as seen in the last section, for the $50\times 50$ mesh, we need to place a bias on $\int_{\Omega}\nabla wC_{2}^{2}\nabla C_{1}\text{d}v$ to protect it from elimination in the first few iterations for data generated from Models 3 and 4. Also note that the error increases significantly to $20.7\%$ with decreasing fidelity as shown in Table 10.

Finally for all results using noisy data, the errors are, in general, higher than those obtained by using same fidelity data, but without noise.

4 Discussion and conclusions

The development of patterns in biophysics and material physics are governed by a range of spatio-temporal PDEs. It is compelling to attempt to discover the analytic forms of these PDEs from data, because doing so immediately provides insight to the governing physics. System identification has been explored using the strong form of the PDEs [10, 11, 13, 14, 15], as discussed in the Introduction. However the equivalent weak forms, which can offer capabilities not achievable by strong forms, have remained under-exploited for system identification. The weak form transfers derivatives to the weighting function, and allows smooth evaluation of the remaining, possibly higher-order, derivatives when NURBS basis functions are employed. In our numerical experiments, NURBS basis functions allow the fourth-order terms in the Cahn-Hilliard and Allen-Cahn equations to be correctly identified with higher accuracy estimation of the pre-factors using data at different fidelity, with and without noise (results in Tables 5, 6, 9 and 10). Another advantage of writing the PDEs in weak form is that Neumann boundary conditions are explicitly included as surface integrals. During system identification, this class of boundary conditions can be, therefore, constructed along with other operators in the PDEs. By incorporating these terms in the regression model, Equation (48), identification of boundary conditions becomes feasible (results in Tables 4 and 8).

Intuitively high fidelity data is favored as it minimizes the residual equations, and can accurately evaluate the time derivative and spatial gradient. For data without noise, the results enjoy these advantages (results in Tables 3, 4, 5 and 6 and Figures 10 - 13). However, the error embedded in noisy data is exaggerated relative to the true signal by small time steps and element lengths, as discussed in the context of Equations (61), (63) and (64). Low fidelity data in time and space may allow greater variations in the true signal relative to the noise, as shown in Figures 15, 16 and 19, and thus improve the results of system identification (results in Tables 7, 8, 9 and 10). On the other hand, low fidelity data may yield inaccurate estimation of the pre-factors due to poor evaluation of the operators (Figure 14). Data of very low fidelity may be incapable of capturing the high-frequency content of solution fields, thus leading to entirely incorrect identification of the operators. For example, without an injection of prior knowledge, we were unable to identify the high frequency operators such as $\int_{\Omega}\nabla wC_{2}^{2}\nabla C_{1}\text{d}v$ in Model 3 and 4 using low fidelity data on the 50 $\times$ 50 mesh. The careful selection of data could improve these results, but that direction for study lies beyond the scope of this first communication.

We have demonstrated our algorithm on data at different fidelity with and without noise, generated from three different types of PDEs. In this first communication, we do not aim to systematically study the uncertainty in the models induced by the lack of fidelity and noise, nor to optimize the data extraction from simulations/experiments. However we point out several factors that could influence the results:

Two analytically independent bases may appear to be linearly dependent over certain time intervals (Figure 9). This spurious linearity could disappear as the system evolves. 2. 2.

Low fidelity data on coarse meshes resolves the variation of fields poorly (see Figure 14). However it helps by allowing larger variation of the true signal relative to the noise. It thus can effectively smooth out the error in bases constructed from noisy data (Figures 16 and 19). 3. 3.

Fields may evolve at different rates, and be distributed with different gradients (Figures 14 and 15 ). As a result a chosen basis may be more prominent at certain DOFs than at others, and contribute more in the linear regression model. Using DOFs over subdomains may improve the results of system identification. Using partial data in system identification has been seen to reduce the computational expense [15], but atempting to systematically improve the results by choosing partial data at certain times has not been widely explored. 4. 4.

We found that results improve when the Neumman boundary condition bases are not included. We could correctly identify the governing equation generated from Model 1 with larger noise by not including the boundary condition bases. We suspect that since the Neumman boundary condition basis is zero at all inner DOFs it renders the matrix of bases, $\Xi$ , close to singular. 5. 5.

Any prior knowledge of the system would help to select the correct basis from many candidates. For example we placed a bias on certain bases when identifying PDEs using data generated from Model 3 with low fidelity data. Similarly, knowing that Model 3 is a conserved system, all bases contributing to the reaction (source) terms can be pre-eliminated.

We note that our treatment is finite-dimensional by being discretization-based, similar to previous work that has used the finite-difference method for representing operators [13, 14, 15, 16, 17, 18]. A different, neural network-based approach [12] enforced PDE constraints at random collocation points in one or two spatial dimensions and in time, thus effectively introducing a finite-dimensional representation with the global basis induced by activation functions. We also note that the neural network approach, while illustrated for learning coefficients in a predefined PDE, remains to be tested for discovery of PDE structures by downselection from a large set of operators, which is central to the treatment in this work and other recent approaches [13, 14, 15, 16, 17, 18].

Finally, we have presented system identification of parabolic PDEs in two dimensions. This is not a very serious restriction for the targeted physics of pattern formation in materials and biophysics. The preponderance of data in these fields is also restricted to two dimensions. Apart from an increase in computational cost, as the set of differential operators increases under the curse of dimensionality, there are no fundamental limitations. Even in this regard, the physics itself restricts the set of three-dimensional operators actually admissible in PDEs. Such classical knowledge could be leveraged to sparsify the operator basis set in three dimensions.

In this work, we currently use standard linear regression models for each iteration of the step-wise regression algorithm. Bayesian frameworks present a natural path to systematically inject prior knowledge and domain expertise, and to quantify and propagate the uncertainty induced by lack of model fidelity and data noise. Furthermore, they enable techniques for comparing competing models that are rooted in a rigorous probabilistic paradigm, such as with the Bayesian model selection and model averaging methods. Bayesian analysis of our methods will appear in future communications.

Acknowledgements

We acknowledge the support of Toyota Research Institute, Award #849910, “Computational framework for data-driven, predictive, multi-scale and multi-physics modeling of battery materials” (ZW and KG). Additional support: This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR0011199002, “Artificial Intelligence guided multi-scale multi-physics framework for discovering complex emergent materials phenomena” (ZW, XH and KG). Simulations in this work were performed using the Extreme Science and Engineering Discovery Environment (XSEDE) Stampede2 at the Texas Advance Computing Center through allocations TG-MSS160003 and TG-DMR180072. XSEDE is supported by National Science Foundation grant number ACI-1548562.

Appendix

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Voss and Kurths [1997] H. Voss and J. Kurths. Reconstruction of nonlinear time delay models from data by the use of optimal transformations. Phys. Lett. A. , 234:336–344, 1997.
2Voss et al. [1999] H. U. Voss, P. Kolodner, M. Abel, and J. Kurths. Amplitude equations from spatiotemporal binary-fluid convection data. Phys. Rev. Lett. , 83, 1999.
3González-García et al. [1998] R. González-García, R. Rico-Martínez, and I. G. Kevrekidis. Identification of distributed parameter systems: A neural net based approach. Computers them. Engng. , 22, 1998.
4Attar and Dowell [2005] P. J. Attar and E. H. Dowell. A reduced order system id approach to the modelling ofnonlinear structural behavior in aeroelasticity. J Fluids Struct. , 21, 2005.
5Khalil et al. [2007] M. Khalil, S. Adhikari, and A. Sarkar. Linear system identification using proper orthogonal decomposition. Mech Syst Signal Process. , 21, 2007.
6Guo et al. [2010] L. Z. Guo, S. A. Billings, and D. Coca. Identification of partial differential equation models for a class of multiscale spatio-temporal dynamical systems. Int. J. Control. , 83, 2010.
7Daniels and Nemenman [2015] B. C. Daniels and L. Nemenman. Automated adaptive inference of phenomenological dynamical models. Nat. Commun. , 6, 2015.
8C. W. Rowley et al. [2009] I. Mezić C. W. Rowley, S. Bagheri, P. Schlatter, and D. S. Henningson. Spectral analysis of nonlinear flows. J. Fluid Mech. , 641, 2009.