Model Order Reduction by Proper Orthogonal Decomposition

Carmen Gr\"a{\ss}le; Michael Hinze; Stefan Volkwein

arXiv:1906.05188·math.NA·August 4, 2020

Model Order Reduction by Proper Orthogonal Decomposition

Carmen Gr\"a{\ss}le, Michael Hinze, Stefan Volkwein

PDF

Open Access

TL;DR

This paper introduces Proper Orthogonal Decomposition (POD) for model order reduction, focusing on nonlinear parametric and time-dependent PDEs, with applications in PDE-constrained optimization and adaptive strategies.

Contribution

It provides a comprehensive overview of POD-MOR, including theoretical foundations, error estimation, adaptivity, and basis update strategies, with practical numerical demonstrations.

Findings

01

Effective reduction of nonlinear PDEs demonstrated

02

Error estimates enable reliable surrogate models

03

Adaptive basis updates improve control applications

Abstract

We provide an introduction to POD-MOR with focus on (nonlinear) parametric PDEs and (nonlinear) time-dependent PDEs, and PDE constrained optimization with POD surrogate models as application. We cover the relation of POD and SVD, POD from the infinite-dimensional perspective, reduction of nonlinearities, certification with a priori and a posteriori error estimates, spatial and temporal adaptivity, input dependency of the POD surrogate model, POD basis update strategies in optimal control with surrogate models, and sketch related algorithmic frameworks. The perspective of the method is demonstrated with several numerical examples.

Figures40

Click any figure to enlarge with its caption.

Tables6

Table 1. Table 1 . Run 2 . Number of needed POD bases in order to achieve a loss of information below the tolerance p 𝑝 p using adaptive finite element meshes (columns 2-5) and uniform finite element discretization (columns 6-9) and POD projection error.

$p$	$ℓ_{c}^{ad}$	$\sum_{i > ℓ} λ_{i}^{c}$	$ℓ_{w}^{ad}$	$\sum_{i > ℓ} λ_{i}^{w}$	$ℓ_{c}^{uni}$	$\sum_{i > ℓ} λ_{i}^{c}$	$ℓ_{w}^{uni}$	$\sum_{i > ℓ} λ_{i}^{w}$
$10^{- 1}$	3	$2.0 \cdot 10^{- 3}$	4	$156.9 \cdot 10^{0}$	3	$2.0 \cdot 10^{- 3}$	4	$157.6 \cdot 10^{0}$
$10^{- 2}$	10	$2.1 \cdot 10^{- 4}$	13	$15.8 \cdot 10^{0}$	10	$2.1 \cdot 10^{- 4}$	13	$15.6 \cdot 10^{0}$
$10^{- 3}$	19	$2.5 \cdot 10^{- 5}$	26	$1.8 \cdot 10^{0}$	19	$2.5 \cdot 10^{- 5}$	25	$1.8 \cdot 10^{0}$
$10^{- 4}$	29	$2.0 \cdot 10^{- 6}$	211	$1.8 \cdot 10^{- 1}$	28	$2.6 \cdot 10^{- 6}$	160	$1.9 \cdot 10^{- 1}$
$10^{- 5}$	37	$2.5 \cdot 10^{- 7}$	644	$1.1 \cdot 10^{- 2}$	37	$2.4 \cdot 10^{- 7}$	419	$2.5 \cdot 10^{- 2}$

Table 2. Table 2 . Run 4 . Relative L 2 ( 0 , T ; L 2 ( Ω ) ) superscript 𝐿 2 0 𝑇 superscript 𝐿 2 Ω L^{2}(0,T;L^{2}(\Omega)) -error between the POD solution and the finite element solution (columns 2-3) and the true solution (columns 4-5), respectively, using adaptive finite element snapshots which are interpolated onto the finest mesh and using a uniform mesh.

$ℓ$	$ε_{FE}^{ad}$	$ε_{FE}^{uni}$	$ε_{true}^{ad}$	$ε_{true}^{uni}$
1	$1.30 \cdot 10^{0}$	$1.30 \cdot 10^{0}$	$1.28 \cdot 10^{0}$	$1.30 \cdot 10^{0}$
3	$7.49 \cdot 10^{- 1}$	$7.58 \cdot 10^{- 1}$	$7.46 \cdot 10^{- 1}$	$7.60 \cdot 10^{- 1}$
5	$4.39 \cdot 10^{- 1}$	$4.45 \cdot 10^{- 1}$	$4.39 \cdot 10^{- 1}$	$4.46 \cdot 10^{- 1}$
10	$1.37 \cdot 10^{- 1}$	$1.37 \cdot 10^{- 1}$	$1.36 \cdot 10^{- 1}$	$1.38 \cdot 10^{- 1}$
20	$3.08 \cdot 10^{- 2}$	$1.56 \cdot 10^{- 2}$	$2.17 \cdot 10^{- 2}$	$1.60 \cdot 10^{- 2}$
30	$2.59 \cdot 10^{- 2}$	$2.04 \cdot 10^{- 3}$	$1.49 \cdot 10^{- 2}$	$3.00 \cdot 10^{- 3}$
50	$2.63 \cdot 10^{- 2}$	$5.67 \cdot 10^{- 5}$	$1.41 \cdot 10^{- 2}$	$2.07 \cdot 10^{- 3}$
100	$2.61 \cdot 10^{- 2}$	$6.48 \cdot 10^{- 8}$	$1.40 \cdot 10^{- 2}$	$2.06 \cdot 10^{- 3}$
150	$2.61 \cdot 10^{- 2}$	$8.13 \cdot 10^{- 7}$	$1.39 \cdot 10^{- 2}$	$2.07 \cdot 10^{- 3}$

Table 3. Table 3 . Run 4 . Relative L 2 ( 0 , T ; H 1 ( Ω ) ) superscript 𝐿 2 0 𝑇 superscript 𝐻 1 Ω L^{2}(0,T;H^{1}(\Omega)) -error between the POD solution and the finite element solution (columns 2-3) and the true solution (columns 4-5), respectively, using adaptive finite element snapshots which are interpolated onto the finest mesh and using a uniform mesh.

$ℓ$	$ε_{FE}^{ad}$	$ε_{FE}^{uni}$	$ε_{true}^{ad}$	$ε_{true}^{uni}$
1	$1.46 \cdot 10^{0}$	$1.46 \cdot 10^{0}$	$1.46 \cdot 10^{0}$	$1.47 \cdot 10^{0}$
3	$1.21 \cdot 10^{0}$	$1.22 \cdot 10^{0}$	$1.22 \cdot 10^{0}$	$1.22 \cdot 10^{0}$
5	$9.39 \cdot 10^{- 1}$	$9.45 \cdot 10^{- 1}$	$9.47 \cdot 10^{- 1}$	$9.51 \cdot 10^{- 1}$
10	$4.22 \cdot 10^{- 1}$	$4.25 \cdot 10^{- 1}$	$4.33 \cdot 10^{- 1}$	$4.31 \cdot 10^{- 1}$
20	$7.76 \cdot 10^{- 2}$	$7.27 \cdot 10^{- 2}$	$1.02 \cdot 10^{- 1}$	$8.19 \cdot 10^{- 2}$
30	$2.92 \cdot 10^{- 2}$	$1.22 \cdot 10^{- 2}$	$7.26 \cdot 10^{- 2}$	$3.52 \cdot 10^{- 2}$
50	$2.61 \cdot 10^{- 2}$	$4.74 \cdot 10^{- 4}$	$7.05 \cdot 10^{- 2}$	$3.27 \cdot 10^{- 2}$
100	$2.79 \cdot 10^{- 2}$	$4.78 \cdot 10^{- 7}$	$6.94 \cdot 10^{- 2}$	$3.27 \cdot 10^{- 2}$
150	$2.93 \cdot 10^{- 2}$	$2.84 \cdot 10^{- 7}$	$6.87 \cdot 10^{- 2}$	$3.27 \cdot 10^{- 2}$

Table 4. Table 4 . Run 4 . CPU times for FE and POD simulation using uniform finite element meshes and adaptive finite element snapshots which are interpolated onto the finest mesh, respectively, and using ℓ = 50 ℓ 50 \ell=50 POD modes.

	adaptive FE mesh	uniform FE mesh	speedup factor
FE simulation	944 sec	8808 sec	9.3
POD offline computations	264 sec	1300 sec	4.9
POD simulation	0.07 sec		–
speedup factor	13485	125828	–

Table 5. Table 5 . Run 5 . Relative L 2 ( 0 , T ; L 2 ( Ω ) ) superscript 𝐿 2 0 𝑇 superscript 𝐿 2 Ω L^{2}(0,T;L^{2}(\Omega)) -error between the POD solution and the finite element solution using adaptive meshes (columns 3-4) and using a uniform mesh (columns 5-6), respectively.

$ℓ^{c}$	$ℓ^{w}$	$c : ε_{FE}^{ad}$	$w : ε_{FE}^{ad}$	$c : ε_{FE}^{uni}$	$w : ε_{FE}^{uni}$
3	4	$8.44 \cdot 10^{- 3}$	$3.00 \cdot 10^{0}$	$8.44 \cdot 10^{- 3}$	$3.75 \cdot 10^{0}$
10	13	$3.30 \cdot 10^{- 3}$	$3.77 \cdot 10^{- 1}$	$3.30 \cdot 10^{- 3}$	$4.32 \cdot 10^{- 1}$
19	26	$1.57 \cdot 10^{- 3}$	$2.12 \cdot 10^{- 1}$	$1.57 \cdot 10^{- 3}$	$2.39 \cdot 10^{- 1}$
29	26	$7.34 \cdot 10^{- 4}$	$1.09 \cdot 10^{- 1}$	$7.32 \cdot 10^{- 4}$	$1.16 \cdot 10^{- 1}$
37	26	$3.57 \cdot 10^{- 4}$	$4.82 \cdot 10^{- 2}$	$3.55 \cdot 10^{- 4}$	$5.04 \cdot 10^{- 2}$
50	50	$1.88 \cdot 10^{- 4}$	$2.17 \cdot 10^{- 2}$	$1.86 \cdot 10^{- 4}$	$2.33 \cdot 10^{- 2}$
65	26	$9.74 \cdot 10^{- 5}$	$1.11 \cdot 10^{- 2}$	$9.56 \cdot 10^{- 5}$	$1.15 \cdot 10^{- 2}$
100	100	$3.37 \cdot 10^{- 5}$	$3.56 \cdot 10^{- 3}$	$3.22 \cdot 10^{- 5}$	$3.42 \cdot 10^{- 3}$

Table 6. Table 6 . Run 5 . Computational times, speedup factors and approximation quality for different POD basis lengths and using different free energy potentials.

	$W^{p}$		$W_{s}^{rel}$
FE	1644 s		3129 s
	$ℓ_{c} = 3$	$ℓ_{c} = 19$	$ℓ_{c} = 3$	$ℓ_{c} = 19$
	$ℓ_{w} = 4$	$ℓ_{w} = 26$	$ℓ_{w} = 4$	$ℓ_{w} = 26$
POD offline	355 s	355 s	350 s	349 s
DEIM offline	8 s	8 s	9 s	10 s
ROM	183 s	191 sec	2616 s	3388 s
ROM-DEIM	0.05 s	0.1 s	0.04 s	no conv.
ROM-proj	0.008 s	0.03 s	0.01 s	0.03 s
speedup FE-ROM	8.9	8.6	1.1	none
speedup FE-ROM-DEIM	32880	16440	78225	–
speedup FE-ROM-proj	205500	54800	312900	104300
rel $L^{2} (Q)$ error ROM	$5.46 \cdot 10^{- 3}$	$3.23 \cdot 10^{- 4}$	$8.44 \cdot 10^{- 3}$	$1.57 \cdot 10^{- 3}$
rel $L^{2} (Q)$ error ROM-DEIM	$1.46 \cdot 10^{- 2}$	$3.83 \cdot 10^{- 4}$	$8.84 \cdot 10^{- 3}$	–
rel $L^{2} (Q)$ error ROM-proj	$4.70 \cdot 10^{- 2}$	$4.18 \cdot 10^{- 2}$	$8.72 \cdot 10^{- 3}$	$9.80 \cdot 10^{- 3}$

Equations239

Y = [y_{1} ∣ \dots ∣ y_{n}] \in R^{m \times n} .

Y = [y_{1} ∣ \dots ∣ y_{n}] \in R^{m \times n} .

\bar{\psi}\in\operatorname*{arg\,max}\bigg{\{}\sum_{j=1}^{n}\big{|}{\langle y_{j},\psi\rangle}\big{|}^{2}\,\Big{|}\,\psi\in\mathbb{R}^{m}\text{ with }{\|\psi\|}=1\bigg{\}}.

\bar{\psi}\in\operatorname*{arg\,max}\bigg{\{}\sum_{j=1}^{n}\big{|}{\langle y_{j},\psi\rangle}\big{|}^{2}\,\Big{|}\,\psi\in\mathbb{R}^{m}\text{ with }{\|\psi\|}=1\bigg{\}}.

\mathscr{V}^{\ell-1}=\mathrm{span}\big{\{}\psi_{1},\dots,\psi_{\ell-1}\big{\}}\subset\mathbb{R}^{m},

\mathscr{V}^{\ell-1}=\mathrm{span}\big{\{}\psi_{1},\dots,\psi_{\ell-1}\big{\}}\subset\mathbb{R}^{m},

\psi_{\ell}=\operatorname*{arg\,max}\bigg{\{}\sum_{j=1}^{n}\big{|}{\langle y_{j},\psi\rangle}\big{|}^{2}\,\Big{|}\,\psi\in\mathbb{R}^{m}\text{ with }{\|\psi\|}=1\text{ and }\psi\perp\mathscr{V}^{\ell-1}\bigg{\}}.

\psi_{\ell}=\operatorname*{arg\,max}\bigg{\{}\sum_{j=1}^{n}\big{|}{\langle y_{j},\psi\rangle}\big{|}^{2}\,\Big{|}\,\psi\in\mathbb{R}^{m}\text{ with }{\|\psi\|}=1\text{ and }\psi\perp\mathscr{V}^{\ell-1}\bigg{\}}.

W^{1/2} ψ_{i} = \tilde{ψ}_{i}, 1 \leq i \leq r,

W^{1/2} ψ_{i} = \tilde{ψ}_{i}, 1 \leq i \leq r,

\overset{ˉ}{Y} \overset{ˉ}{Y}^{⊤} \tilde{ψ}_{i} = λ_{i} \tilde{ψ}_{i}, i = 1, \dots, r and λ_{1} \geq \dots \geq λ_{r} > 0,

\overset{ˉ}{Y} \overset{ˉ}{Y}^{⊤} \tilde{ψ}_{i} = λ_{i} \tilde{ψ}_{i}, i = 1, \dots, r and λ_{1} \geq \dots \geq λ_{r} > 0,

W=\big{(}\big{(}{\langle e_{i},e_{j}\rangle}\big{)}\big{)}_{1\leq i,j\leq m}.

W=\big{(}\big{(}{\langle e_{i},e_{j}\rangle}\big{)}\big{)}_{1\leq i,j\leq m}.

\overset{ˉ}{Y}^{⊤} \overset{ˉ}{Y} ϕ_{i} = λ_{i} ϕ_{i}, i = 1, \dots, r and λ_{1} \geq \dots \geq λ_{r} > 0,

\overset{ˉ}{Y}^{⊤} \overset{ˉ}{Y} ϕ_{i} = λ_{i} ϕ_{i}, i = 1, \dots, r and λ_{1} \geq \dots \geq λ_{r} > 0,

ψ_{i} = \frac{1}{σ _{i}} \overset{ˉ}{Y} ϕ_{i}, i = 1, \dots, r,

ψ_{i} = \frac{1}{σ _{i}} \overset{ˉ}{Y} ϕ_{i}, i = 1, \dots, r,

\mathscr{V}=\mathrm{span}\,\Big{\{}y_{j}^{k}\,\big{|}\,1\leq j\leq n_{k}\text{ and }1\leq k\leq K\Big{\}}\subset X

\mathscr{V}=\mathrm{span}\,\Big{\{}y_{j}^{k}\,\big{|}\,1\leq j\leq n_{k}\text{ and }1\leq k\leq K\Big{\}}\subset X

\left.\begin{aligned} &\max\sum_{i=1}^{\ell}\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\alpha_{j}^{k}\,\big{|}{\langle y_{j}^{k},\psi_{i}\rangle}_{X}\big{|}^{2}\\ &\hskip 1.42262pt\text{s.t. }\{\psi_{i}\}_{i=1}^{\ell}\subset X\text{ and }{\langle\psi_{i},\psi_{j}\rangle}_{X}=\delta_{ij},~{}1\leq i,j\leq\ell\end{aligned}\right\}

\left.\begin{aligned} &\max\sum_{i=1}^{\ell}\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\alpha_{j}^{k}\,\big{|}{\langle y_{j}^{k},\psi_{i}\rangle}_{X}\big{|}^{2}\\ &\hskip 1.42262pt\text{s.t. }\{\psi_{i}\}_{i=1}^{\ell}\subset X\text{ and }{\langle\psi_{i},\psi_{j}\rangle}_{X}=\delta_{ij},~{}1\leq i,j\leq\ell\end{aligned}\right\}

R Ψ_{i} = λ_{i} Ψ_{i} for 1 \leq i \leq ℓ,

R Ψ_{i} = λ_{i} Ψ_{i} for 1 \leq i \leq ℓ,

R Ψ = k = 1 \sum K j = 1 \sum n_{k} α_{j}^{k} ⟨ Ψ, y_{j}^{k} ⟩_{X} y_{j}^{k} for Ψ \in X .

R Ψ = k = 1 \sum K j = 1 \sum n_{k} α_{j}^{k} ⟨ Ψ, y_{j}^{k} ⟩_{X} y_{j}^{k} for Ψ \in X .

R = Y Y^{*}

R = Y Y^{*}

\mathcal{Y}:\mathbb{R}^{n}\to X,\quad\mathcal{Y}(\Phi)=\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\sqrt{\alpha_{j}^{k}}\phi_{j}^{k}y_{j}^{k}\quad\text{for }\Phi=\big{(}\phi_{1}^{1},\ldots,\phi^{K}_{n_{K}}\big{)}\in\mathbb{R}^{n},

\mathcal{Y}:\mathbb{R}^{n}\to X,\quad\mathcal{Y}(\Phi)=\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\sqrt{\alpha_{j}^{k}}\phi_{j}^{k}y_{j}^{k}\quad\text{for }\Phi=\big{(}\phi_{1}^{1},\ldots,\phi^{K}_{n_{K}}\big{)}\in\mathbb{R}^{n},

Y^{*} (Ψ) = (⟨ Ψ, α_{1}^{1} y_{1}^{1} ⟩_{X}, \dots, ⟨ Ψ, α_{n_{K}}^{K} y_{n_{K}}^{K} ⟩_{X})^{⊤} for Ψ \in X .

Y^{*} (Ψ) = (⟨ Ψ, α_{1}^{1} y_{1}^{1} ⟩_{X}, \dots, ⟨ Ψ, α_{n_{K}}^{K} y_{n_{K}}^{K} ⟩_{X})^{⊤} for Ψ \in X .

\mathcal{K}\Phi=\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\bigg{(}\sqrt{\alpha_{1}^{1}\alpha_{j}^{k}}\phi_{j}^{k}\,{\langle y_{j}^{k},y_{1}^{1}\rangle}_{X},\dots,\sqrt{\alpha_{n_{K}}^{K}\alpha_{j}^{k}}\phi_{j}^{k}\,{\langle y_{j}^{k},y_{n_{K}}^{K}\rangle}_{X}\bigg{)}^{\top}

\mathcal{K}\Phi=\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\bigg{(}\sqrt{\alpha_{1}^{1}\alpha_{j}^{k}}\phi_{j}^{k}\,{\langle y_{j}^{k},y_{1}^{1}\rangle}_{X},\dots,\sqrt{\alpha_{n_{K}}^{K}\alpha_{j}^{k}}\phi_{j}^{k}\,{\langle y_{j}^{k},y_{n_{K}}^{K}\rangle}_{X}\bigg{)}^{\top}

Φ_{i} = \frac{1}{λ _{i}} Y^{*} Ψ_{i}, and Ψ_{i} = \frac{1}{λ _{i}} Y Φ_{i}, for i = 1, \dots, r .

Φ_{i} = \frac{1}{λ _{i}} Y^{*} Ψ_{i}, and Ψ_{i} = \frac{1}{λ _{i}} Y Φ_{i}, for i = 1, \dots, r .

\sum_{i=1}^{\ell}\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\alpha_{j}^{k}\,\big{|}{\langle y_{j}^{k},\Psi_{i}\rangle}_{X}\big{|}^{2}=\sum_{i=1}^{\ell}\lambda_{i},

\sum_{i=1}^{\ell}\sum_{k=1}^{K}\sum_{j=1}^{n_{k}}\alpha_{j}^{k}\,\big{|}{\langle y_{j}^{k},\Psi_{i}\rangle}_{X}\big{|}^{2}=\sum_{i=1}^{\ell}\lambda_{i},

k = 1 \sum K j = 1 \sum n_{k} α_{j}^{k} y_{j}^{k} - i = 1 \sum ℓ k = 1 \sum K j = 1 \sum n_{k} ⟨ y_{j}^{k}, Ψ_{i} ⟩_{X} Ψ_{i}_{X}^{2} = i = ℓ + 1 \sum r λ_{i} .

k = 1 \sum K j = 1 \sum n_{k} α_{j}^{k} y_{j}^{k} - i = 1 \sum ℓ k = 1 \sum K j = 1 \sum n_{k} ⟨ y_{j}^{k}, Ψ_{i} ⟩_{X} Ψ_{i}_{X}^{2} = i = ℓ + 1 \sum r λ_{i} .

E (ℓ) = \frac{\sum _{i = 1}^{ℓ} λ _{i}}{\sum _{i = 1}^{r} λ _{i}} \in [0, 1] .

E (ℓ) = \frac{\sum _{i = 1}^{ℓ} λ _{i}}{\sum _{i = 1}^{r} λ _{i}} \in [0, 1] .

i = 1 \sum r λ_{i} = k = 1 \sum K j = 1 \sum n_{k} α_{j}^{k} ∥ y_{j}^{k} ∥_{X}^{2}

i = 1 \sum r λ_{i} = k = 1 \sum K j = 1 \sum n_{k} α_{j}^{k} ∥ y_{j}^{k} ∥_{X}^{2}

E (ℓ) = \frac{\sum _{i = 1}^{ℓ} λ _{i}}{\sum _{k = 1}^{K} \sum _{j = 1}^{n_{k}} α _{j}^{k} ∥ y _{j}^{k} ∥ _{X}^{2}} \in [0, 1],

E (ℓ) = \frac{\sum _{i = 1}^{ℓ} λ _{i}}{\sum _{k = 1}^{K} \sum _{j = 1}^{n_{k}} α _{j}^{k} ∥ y _{j}^{k} ∥ _{X}^{2}} \in [0, 1],

D^{k}

D^{k}

D

Y^{k}

Y

\overset{ˉ}{Y} \overset{ˉ}{Y}^{⊤} \overset{ˉ}{Ψ}_{i} = λ_{i} \overset{ˉ}{Ψ}_{i} for 1 \leq i \leq ℓ

\overset{ˉ}{Y} \overset{ˉ}{Y}^{⊤} \overset{ˉ}{Ψ}_{i} = λ_{i} \overset{ˉ}{Ψ}_{i} for 1 \leq i \leq ℓ

\overset{ˉ}{Y}^{⊤} \overset{ˉ}{Y} \overset{ˉ}{Φ}_{i} = λ_{i} \overset{ˉ}{Φ}_{i} for 1 \leq i \leq ℓ

\overset{ˉ}{Y}^{⊤} \overset{ˉ}{Y} \overset{ˉ}{Φ}_{i} = λ_{i} \overset{ˉ}{Φ}_{i} for 1 \leq i \leq ℓ

\Uppsi^{\top}\bar{Y}\Upphi=\left(\begin{array}[]{cc}\Sigma^{\mathsf{r}}&0\\ 0&0\end{array}\right)=:\Upsigma\in\mathbb{R}^{m\times n},

\Uppsi^{\top}\bar{Y}\Upphi=\left(\begin{array}[]{cc}\Sigma^{\mathsf{r}}&0\\ 0&0\end{array}\right)=:\Upsigma\in\mathbb{R}^{m\times n},

V ↪ H ≃ H^{'} ↪ V^{'},

V ↪ H ≃ H^{'} ↪ V^{'},

\displaystyle\big{|}a(t;\varphi,\phi)\big{|}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Numerical methods for differential equations · Advanced Numerical Methods in Computational Mathematics

Full text

Model Order Reduction by

Proper Orthogonal Decomposition

Carmen Gräßle (Universität Hamburg),

Michael Hinze (Universität Koblenz-Landau),

Stefan Volkwein (Universität Konstanz)

Abstract. We provide an introduction to POD-MOR with focus on (nonlinear) parametric PDEs and (nonlinear) time-dependent PDEs, and PDE constrained optimization with POD surrogate models as application. We cover the relation of POD and SVD, POD from the infinite-dimensional perspective, reduction of nonlinearities, certification with a priori and a posteriori error estimates, spatial and temporal adaptivity, input dependency of the POD surrogate model, POD basis update strategies in optimal control with surrogate models, and sketch related algorithmic frameworks. The perspective of the method is demonstrated with several numerical examples.

Key words. POD model order reduction, (discrete) empirical interpolation, adaptivity, parametric PDEs, evolutionary PDEs, certification with error analysis.

Mathematics Subject Classification. 35B30, 37M99, 41A05, 65K99, 93A15, 93C05

1. Introduction

Proper orthogonal decomposition (POD) is a method which comprises the essential information contained in data sets. Data sets may have their origin in various sources, like, e.g., (uncertain) measurements of geophysical processes, numerical simulations of (parameter-dependent) complex physical problems, or (dynamical) imaging. In order to illustrate the POD idea of information extraction let $\{y_{1},\ldots,y_{n}\}\subset\mathbb{R}^{m}$ denote a vector cloud (which here serves as our data set), where we suppose at least one of the vectors $y_{j}$ is nonzero. Let us collect the vectors $y_{j}$ in the data matrix

[TABLE]

Then we have $\mathsf{r}=\mathrm{rank}\,Y\in\{1,\ldots,\min(m,n)\}$ . Our aim now is to find a vector $\bar{\psi}\in\mathbb{R}^{m}$ with length one which carries as much information of this vector cloud as possible. Of course, we here have to specify what information in this context means. For this purpose we equip $\mathbb{R}^{m}$ with some inner product $\langle\cdot\,,\cdot\rangle$ and induced norm $\|\cdot\|$ . We define the information content of vector $y$ with respect to some unit vector $\psi$ by the quantity $|\langle y,\psi\rangle|$ . Then we determine the special vector $\bar{\psi}\in\mathbb{R}^{m}$ by solving the maximization problem

[TABLE]

Notice that the solution to the maximization problem in (1) is not unique. If $\bar{\psi}$ is a vector, where the maximum is attained, then $-\bar{\psi}$ is an optimal solution, too. Let us label the vector $\bar{\psi}$ by $\psi_{1}$ . We now iterate this procedure; suppose that for $2\leq\ell\leq\mathsf{r}$ we have already computed such $\ell-1$ orthonormal vectors $\{\psi_{i}\}_{i=1}^{\ell-1}$ , then seek a unit vector $\psi_{\ell}\in\mathbb{R}^{m}$ which is perpendicular to the $(\ell-1)$ -dimensional subspace

[TABLE]

and which carries as much information of our vector cloud as possible, i.e., satisfies

[TABLE]

It is now straightforward to see that the vectors $\{\psi_{i}\}_{i=1}^{\mathsf{r}}$ are given by

[TABLE]

where the $\tilde{\psi}_{i}$ ’s solve the eigenvalue problem (cf. [42, 53])

[TABLE]

where $\bar{Y}=W^{1/2}Y\in\mathbb{R}^{m\times n}$ with the symmetric, positive definite (weighting) matrix

[TABLE]

In (3) the vector $e_{i}$ denotes the $i$ -th unit vector in $\mathbb{R}^{m}$ . The modes $\{\psi_{i}\}_{i=1}^{\ell}$ obtained in this way are called POD Modes or Principal Components of our data cloud. If now $m\gg n\geq\mathsf{r}$ it is advantageous to consider the eigenvalue problem

[TABLE]

which admits the same eigenvalues $\lambda_{i}$ as before. The modes $\psi_{i}$ and $\phi_{i}$ , $i=1,\ldots,\mathsf{r}$ , are related by singular value decomposition (SVD):

[TABLE]

and $\sigma_{i}=\sqrt{\lambda_{i}}>0$ is the $i$ -th singular value of the weighted data matrix $\bar{Y}$ . Notice that in contrast to (2) the square root matrix $W^{1/2}$ is not required.

It is now clear that a vector cloud also could be replaced by a function cloud $\{y({\bm{\mu}}_{j})\,|\,j=1,\ldots,n\}\subset X$ in some Hilbert space $(X,\langle\cdot\,,\cdot\rangle_{X})$ , where $\{{\bm{\mu}}_{j}\}_{j=1}^{n}$ are parameters which may refer to, e.g., time instances of a dynamic process, or stochastic variables, and the concept of information extraction by the above maximization problems directly carries over to this situation. As it is shown in the next section we can even extend this concept to general Hilbert spaces. This will be formalized in Section 2.1 below. From the considerations above it also becomes clear that POD is closely related to SVD. This is outlined in Section 2.2. The POD method for abstract nonlinear evolution problems is explained in Section 2.3. The Hilbert space perspective also allows us to treat spatially discrete evolution equations, which include adaptive concepts for the spatial discretization. This is outlined in Section 2.4. The POD-Galerkin procedure is explained in Section 3, including a discussion of the treatment of nonlinearities. The certification of the POD method with a priori and a posteriori error bounds is outlined in Section 4. The POD approach heavily relies on the choice of the snapshots. Related approaches are discussed in Section 5. In Section 6 we briefly address the scope of the POD method in the context of optimal control of PDEs. Finally, in Section 7 we sketch further important research trends related to POD. Our analytical exposition is supported by several numerical experiments which give an impression of the power of the approach.

POD is one of the most successfully used model reduction techniques for nonlinear dynamical systems; see, e.g., [23, 42, 53, 75, 90] and the references therein. It is applied in a variety of fields including fluid dynamics, coherent structures [4, 9] and inverse problems [13]. Moreover in [11] POD is successfully applied to compute reduced-order controllers. The relationship between POD and balancing was considered in [61, 82, 101]. An error analysis for nonlinear dynamical systems in finite dimensions was carried out in [78] and a missing point estimation in models described by POD was studied in [10].

2. Proper Orthogonal Decomposition (POD)

In this section we introduce a discrete variant of the POD method, where we follow partially [42, Section 1.2.1]. For a continuous variant of the POD method and its relationship to the discrete one we refer the reader to [58] and [42, Sections 1.2.2 and 1.2.3].

2.1. The POD method

Suppose that $K,n_{1},\ldots,n_{K}$ are fixed natural numbers. Let the so-called snapshot ensembles $\{y_{j}^{k}\}_{j=1}^{n_{k}}\subset X$ be given for $1\leq k\leq K$ , where $X$ is a separable real Hilbert space. For POD in complex Hilbert spaces we refer the reader to [97]. We set $n=n_{1}+\ldots+n_{K}$ . To avoid a trivial case we suppose that at least one of the $y_{j}^{k}$ ’s is nonzero. Then we introduce the finite dimensional, linear snapshot space

[TABLE]

with finite dimension $d\leq n$ . We distinguish two cases:

The separable Hilbert space $X$ has finite dimension $m$ : Then $X$ is isomorphic to $\mathbb{R}^{m}$ ; see, e.g., [81, p. 47]. We define the finite index set $\mathbb{I}=\{1,\ldots,m\}$ . Clearly, we have $1\leq\mathsf{r}\leq\min(n,m)$ . Especially in case of $X=\mathbb{R}^{m}$ , the snapshots $y_{j}^{k}=(y_{ij}^{k})_{1\leq i\leq m}$ are vectors in $\mathbb{R}^{m}$ for $k=1,\ldots,K$ .

2)

$X$ is infinite-dimensional: Since $X$ is separable, each orthonormal basis of $X$ has countably many elements. In this case $X$ is isomorphic to the set $\ell_{2}$ of sequences $\{x_{i}\}_{i\in\mathbb{N}}$ of complex numbers which satisfy $\sum_{i=1}^{\infty}|x_{i}|^{2}<\infty$ ; see [81, p. 47], for instance. The index set $\mathbb{I}$ is now the countable, but infinite set $\mathbb{N}$ .

The POD method consists in choosing a complete orthonormal basis $\{\psi_{i}\}_{i\in\mathbb{I}}$ in $X$ such that for every $\ell\in\{1,\ldots,\mathsf{r}\}$ the information content of the given snapshots $y_{j}^{k}$ is maximized in the following sense:

[TABLE]

with positive weighting parameters $\alpha_{j}^{k}$ , $j=1,\dots,n_{k}$ and $k=1,\dots,K$ . Here, the symbol $\delta_{ij}$ denotes the Kronecker symbol satisfying $\delta_{ii}=1$ and $\delta_{ij}=0$ for $i\neq j$ .

An optimal solution $\{\Psi_{i}\}_{i=1}^{\ell}$ to ( $\mathbf{P}^{\ell}$ ) is called a POD basis of rank $\ell$ . It is proved in [42, Theorem 1.8] that for every $\ell\in\{1,\ldots,\mathsf{r}\}$ a solution $\{\Psi_{i}\}_{i=1}^{\ell}$ to ( $\mathbf{P}^{\ell}$ ) is characterized by the eigenvalue problem

[TABLE]

where $\lambda_{1}\geq\ldots\geq\lambda_{\mathsf{r}}>0$ denote the largest eigenvalues of the linear, bounded, nonnegative and self-adjoint operator $\mathcal{R}:X\to X$ given as

[TABLE]

Moreover, the operator $\mathcal{R}$ can be presented in the form

[TABLE]

with the mapping

[TABLE]

where $\mathcal{Y}^{*}:X\to\mathbb{R}^{n}$ denotes the Hilbert space adjoint of $\mathcal{Y}$ , whose action is given by

[TABLE]

The operator $\mathcal{K}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ , $\mathcal{K}:=\mathcal{Y}^{*}\mathcal{Y}$ then admits the same nonzero eigenvalues $\lambda_{1}\geq\ldots\geq\lambda_{\mathsf{r}}>0$ with corresponding eigenvectors $\Phi_{1},\dots,\Phi_{\mathsf{r}}$ , and its action is given by

[TABLE]

with the vector $\Phi=(\phi_{1}^{1},\ldots,\phi^{K}_{n_{K}})\in\mathbb{R}^{n}$ . For the eigensystems of $\mathcal{R}$ and $\mathcal{K}$ there holds the relation

[TABLE]

Furthermore, we obtain

[TABLE]

and for the POD projection error we get

[TABLE]

Thus, the decay rate of the positive eigenvalues $\{\lambda_{i}\}_{i=1}^{\mathrm{r}}$ plays an essential role for a successful application of the POD method. In general, one has to utilize a complete orthonormal basis $\{\Psi_{i}\}_{i\in\mathbb{I}}\subset X$ to represent elements in the snapshot space $\mathscr{V}$ by their Fourier sum. This leads to a high-dimensional or even infinite-dimensional approximation scheme. Nevertheless, if the term $\sum_{i=\ell+1}^{\mathsf{r}}\lambda_{i}$ is sufficiently small for a not too large $\ell$ , elements in the subspace $\mathscr{V}$ can be approximated by a linear combination of the few basis elements $\{\Psi_{i}\}_{i=1}^{\ell}$ . This offers the chance to reduce the number of terms in the Fourier series using the POD basis of rank $\ell$ , as shown in the following examples. For this reason it is useful to define information content of the basis $\{\Psi_{i}\}_{i=1}^{\ell}$ in $\mathscr{V}$ by the quantity

[TABLE]

It can e.g. be utilized to determine a basis of length $\ell\in\{1,\dots,\mathsf{r}\}$ containing $\approx 99\%$ of the information contained in $\mathscr{V}$ by requiring $\mathcal{E}(\ell)\approx 99\%$ . Now it is shown in [42, Section 1.2.1] that

[TABLE]

holds true. This implies

[TABLE]

so that the quantity $\mathcal{E}(\ell)$ can be computed without knowing the eigenvalues $\lambda_{\ell+1},\ldots,\lambda_{\mathsf{r}}$ .

2.2. Singular value decomposition and POD

To investigate the relationship between singular value decomposition (SVD) and POD let us discuss the POD method for the specific case $X=\mathbb{R}^{m}$ . Then we define the matrices

[TABLE]

where we have introduced the weighting matrix $W\in\mathbb{R}^{m\times m}$ in (3).

Remark 1.

Let us mention that $\bar{Y}=Y$ holds true provided all $\alpha_{j}^{k}$ are equal to one (i.e., $D$ is the identity matrix) and the inner product in $X$ is given by the Euclidean inner product (i.e., $W$ is the identity matrix). $\Diamond$

Now (5) is equivalent to the $m\times m$ eigenvalue problem

[TABLE]

with $\Psi_{i}=W^{-1/2}\bar{\Psi}_{i}$ and the $n\times n$ eigenvalue problem

[TABLE]

with $\Psi_{i}=YD^{1/2}\bar{\Phi}_{i}/\sqrt{\lambda_{i}}$ . If $m\ll n$ holds, we solve (12). However, we have to solve the linear system $W^{1/2}\Psi_{i}=\bar{\Psi}_{i}$ for any $i=1,\ldots,\ell$ in order to get the POD basis $\{\Psi_{i}\}_{i=1}^{\ell}$ . Thus, if $n\leq m$ holds, we will compute the solution $\{\bar{\Phi}_{i}\}_{i=1}^{\ell}$ to (13) and get the POD basis by the formula $\Psi_{i}=YD^{1/2}\bar{\Phi}_{i}/\sqrt{\lambda_{i}}$ . In that case we also have $\bar{Y}^{\top}\bar{Y}=Y^{\top}WY$ so that we do not have to compute the square root matrix $W^{1/2}$ . On the other hand, the diagonal matrix $D^{1/2}$ can be computed easily. The relationship between (12) and (13) is given by SVD: There exist real numbers $\sigma_{1}\geq\ldots\geq\sigma_{\mathsf{r}}>0$ and orthogonal matrices $\Uppsi\in\mathbb{R}^{m\times m}$ , $\Upphi\in\mathbb{R}^{n\times n}$ with column vectors $\{\bar{\Psi}_{i}\}_{i=1}^{m}$ , $\{\bar{\Phi}_{i}\}_{i=1}^{n}$ , respectively, such that

[TABLE]

where $\Sigma^{\mathsf{r}}=\mathrm{diag}\,(\sigma_{1},\ldots,\sigma_{\mathsf{r}})\in\mathbb{R}^{\mathsf{r}\times\mathsf{r}}$ and the zeros in (14) denote matrices of appropriate dimensions. Moreover, the vectors $\{\bar{\Psi}_{i}\}_{i=1}^{\mathsf{r}}$ and $\{\bar{\Phi}_{i}\}_{i=1}^{\mathsf{r}}$ are eigenvectors of $\bar{Y}\bar{Y}^{\top}$ and $\bar{Y}^{\top}\bar{Y}$ , respectively, with eigenvalues $\lambda_{i}=(\sigma_{i})^{2}>0$ for $i=1,\ldots,\mathsf{r}$ . The vectors $\{\bar{\Psi}_{i}\}_{i=\mathsf{r}+1}^{m}$ and $\{\bar{\Phi}_{i}\}_{i=\mathsf{r}+1}^{n}$ (if $\mathsf{r}<m$ respectively $\mathsf{r}<n$ ) are eigenvectors of $\bar{Y}\bar{Y}^{\top}$ and $\bar{Y}^{\top}\bar{Y}$ with eigenvalue [math]. We summarize the computation of the POD basis in the pseudo code function [ $\Psi$ , $\Lambda$ ]= POD( $Y$ , $W$ , $D$ , $\ell$ , flag).

function [ $\Psi$ , $\Lambda$ ]= POD( $Y$ , $W$ , $D$ , $\ell$ , flag)

0: Snapshots matrix $Y=[Y^{1},\ldots,Y^{K}]$ with rank $\mathsf{r}$ , weighting matrices $W$ , $D$ , number $\ell$ of POD functions and flag for the solver;

1: if flag = 0 then

2: Set $\bar{Y}=W^{1/2}YD^{1/2}$ ;

3: Compute singular value decomposition $[\Uppsi,\Upsigma,\Upphi]=\mathtt{svd}\,(\bar{Y})$ ;

4: Define $\bar{\Psi}_{i}$ as the $i$ -th column of $\Uppsi$ and $\sigma_{i}=\Upsigma_{ii}$ for $1\leq i\leq\ell$ ;

5: Set $\Psi_{i}=W^{-1/2}\bar{\Psi}_{i}$ and $\lambda_{i}=\sigma_{i}^{2}$ for $i=1,\ldots,\ell$ ;

6: else if flag = 1 then

7: Compute eigenvalue decomposition $[\Uppsi,\Uplambda]=\mathtt{eig}\,(\bar{Y}\bar{Y}^{\top})$ ;

8: Define $\bar{\Psi}_{i}$ as the $i$ -th column of $\Uppsi$ and $\lambda_{i}=\Uplambda_{ii}$ for $1\leq i\leq\ell$ ;

9: Set $\Psi_{i}=W^{-1/2}\bar{\Psi}_{i}$ for $i=1,\ldots,\ell$ ;

10: else if flag = 2 then

11: Compute eigenvalue decomposition $[\Upphi,\Uplambda]=\mathtt{eig}\,(\bar{Y}^{\top}\bar{Y})$ ;

12: Define $\bar{\phi}_{i}$ as the $i$ -th column of $\Upphi$ and $\lambda_{i}=\Uplambda_{ii}$ for $1\leq i\leq\ell$ ;

13: Set $\Psi_{i}=YD^{1/2}\bar{\phi}_{i}/\sqrt{\lambda_{i}}$ for $i=1,\ldots,\ell$ ;

14: end if

15: return $\Psi=[\Psi_{1}\,|\ldots|\,\Psi_{\ell}]$ and $\Uplambda=[\lambda_{1}\,|\ldots|\,\lambda_{\ell}]$ ;

2.3. The POD method for nonlinear evolution problems

In this subsection we explain the POD method for abstract nonlinear evolution problems. We focus on the numerical realization. For detailed theoretical investigations we refer the reader to [42, 50, 51, 57, 58]; for instance.

2.3.1. The nonlinear evolution problems

Let us formulate the nonlinear evolution problem. For that purpose we suppose the following hypotheses.

Assumption 1.

Suppose that $T>0$ holds, where $[0,T]$ is the considered finite time horizon.

$V$ * and $H$ are real, separable Hilbert spaces and suppose that $V$ is dense in $H$ with compact embedding. By $\langle\cdot\,,\cdot\rangle_{H}$ and $\langle\cdot\,,\cdot\rangle_{V}$ we denote the inner products in $H$ and $V$ , respectively. We identify $H$ with its dual (Hilbert) space $H^{\prime}$ by the Riesz isomorphism so that we have the Gelfand triple*

[TABLE]

where each embedding is continuous and dense. The last embedding is understood as follows: For every element $h\in H^{\prime}$ and $v\in V$ , we also have $v\in H$ by the embedding $V\hookrightarrow H$ , so we can define $\langle h^{\prime},v\rangle_{V^{\prime},V}=\langle h^{\prime},v\rangle_{H^{\prime},H}$ . 2. 2)

For almost all $t\in[0,T]$ we define a time-dependent bilinear form $a(t;\cdot\,,\cdot):V\times V\to\mathbb{R}$ satisfying

[TABLE]

for time-independent constants $\gamma,\gamma_{2}\geq 0$ , $\gamma_{1}>0$ and where a.e. stands for almost everywhere. 3. 3)

Assume that $y_{\circ}\in V$ , $f\in L^{2}(0,T;H)$ holds. Here we refer to [27, pp. 469-472] for vector-valued function spaces.

Recall the function space

[TABLE]

which is a Hilbert space endowed with the standard inner product; cf. [27, pp. 472-479]. Furthermore, we have

[TABLE]

in the sense of distributions in $[0,T]$ . Here, $\langle\cdot\,,\cdot\rangle_{V^{\prime},V}$ stands for the dual pairing between $V$ and its dual $V^{\prime}$ .

Now the evolution problem is given as follows: Find the state $y\in W(0,T)\cap C([0,T];V)$ such that

[TABLE]

Throughout we assume that (16) admits a unique solution $y\in W(0,T)\cap C([0,T];V)$ . Of course, this requires some properties for the nonlinear mapping $\mathcal{N}$ which we will not specify here.

Example 1 (Semilinear heat equation).

Let $\Omega\subset\mathbb{R}^{d}$ , $d\in\{2,3\}$ be a bounded open domain with Lipschitz-continuous boundary $\partial\Omega$ and let $T>0$ be a fixed end time. We set $Q:=(0,T)\times\Omega$ and $\Sigma:=(0,T)\times\partial\Omega$ and $c\geq 0$ . For a given forcing term $f\in L^{2}(Q)$ and initial condition $y_{\circ}\in L^{2}(\Omega)$ , we consider the semilinear heat equation with homogeneous Dirichlet boundary condition:

[TABLE]

The existence of a unique solution to (17) is proved in [83], for example. We can write (17) as an abstract evolution problem of type (16) by deriving a variational formulation for (17) with $V=H_{0}^{1}(\Omega)$ as the space of test functions, $H=L^{2}(\Omega)$ and integrating over the space $\Omega$ . The bilinear form $a:V\times V\to\mathbb{R}$ is introduced by

[TABLE]

and the operator $\mathcal{N}:V\to V^{\prime}$ is defined as $\mathcal{N}(\varphi)=c\varphi^{3}$ for $\varphi\in V$ . For $c\equiv 0$ , the heat equation (17) is linear. $\Diamond$

Example 2 (Cahn-Hilliard equations).

Let $\Omega$ , $T$ , $Q$ and $\Sigma$ be defined as in Example 1. The Cahn-Hilliard system was proposed in [21] as a model for phase separation in binary alloys. Introducing the chemical potential $w$ , the Cahn-Hilliard equations can be formulated in the common setting as a coupled system for the phase field $c$ and the chemical potential $w$ :

[TABLE]

By $\nu_{\Omega}$ we denote the outward normal on $\partial\Omega$ , $\mathsf{m}\geq 0$ is a constant mobility, $\sigma>0$ denotes the surface tension and $0<\varepsilon\ll 1$ represents the interface parameter. Note that the convective term $y\cdot\nabla c$ describes the transport with (constant) velocity $y$ . The transport term represents the coupling to the Navier-Stokes equations in the context of multiphase flow, see e.g. [52] and [2]. The phase field function $c$ describes the phase of a binary material with components $A$ and $B$ and takes the values $c\equiv-1$ in the pure $A$ -phase and $c\equiv+1$ in the pure $B$ -phase. The interfacial region is described by $c\in(-1,1)$ and admits a thickness of order $\mathcal{O}(\varepsilon)$ , see e.g. Fig. 5, left column, where the binary phases are colored in blue and red, respectively, and the interfacial region is depicted in white. The function $W(c)$ represents the free energy and is of double well-type. A typical choice for $W$ is the polynomial free energy function

[TABLE]

with two minima at $c=\pm 1$ , which describe the energetically favorable states. It is infinitely often differentiable. Another choice for $W$ is the $C^{1}$ relaxed double obstacle free energy

[TABLE]

with relaxation parameter $s\gg 0$ , which is introduced in [43] as the Moreau-Yosida relaxation of the double obstacle free energy

[TABLE]

The energies $W^{p}(c)$ and $W^{\text{rel}}_{s}(c)$ later will be used to compare the performance of POD on systems with smooth and less smooth nonlinearities. For more details on the choices for $W$ we refer to [1] and [19], for example. Concerning existence, uniqueness and regularity of a solution to (18), we refer to [19]. In order to derive a variational form of type (16), we rewrite (18) as a single fourth-order parabolic equation for $c$ by

[TABLE]

We choose $V=\{v\in H^{1}(\Omega):\frac{1}{|\Omega|}\int_{\Omega}v=0\}$ equipped with the inner product $(u,v)_{V}:=\int_{\Omega}\nabla u\nabla v$ , so that the dual space of $V$ is given by $V^{\prime}=\{f\in(H^{1}(\Omega))^{\prime}:\langle f,1\rangle=0\}$ such that $V\hookrightarrow H=V^{\prime}$ and $\langle.,.\rangle$ denotes the duality pairing. We note that $(V,(.,.)_{V})$ is a Hilbert space. We define the $V^{\prime}-$ inner product for $f,g\in V^{\prime}$ as $(f,g)_{V^{\prime}}:=\int_{\Omega}\nabla(-\Delta)^{-1}f\cdot\nabla(-\Delta)^{-1}g$ where $(-\Delta)^{-1}$ denotes the inverse of the negative Laplacian with zero Neumann boundary data. Note that $(f,g)_{V^{\prime}}=(f,(-\Delta)^{-1}g)_{L^{2}(\Omega)}=((-\Delta)^{-1}f,g)_{L^{2}(\Omega)}$ . We introduce the bilinear form $a:V\times V\to\mathbb{R}$ by

[TABLE]

and define the nonlinear operator $\mathcal{N}$ by $\mathcal{N}(c)=\frac{\sigma}{\varepsilon}W^{\prime}(c).$ The evolution problem can be written in the form

[TABLE]

We note that this fits our abstract setting formulated in (16) with the Gelfand triple $V\hookrightarrow H\equiv V^{\prime}\hookrightarrow V^{\prime}$ . $\hfill\Diamond$

2.3.2. Temporal discretization and POD method

Let $0=t_{1}<\ldots<t_{n_{t}}=T$ be a given time grid with step sizes $\Delta t_{j}=t_{j}-t_{j-1}$ for $j=2,\ldots,n_{t}$ . Suppose that for any $j\in\{1,\ldots,n_{t}\}$ the element $y_{j}\in V\subset H$ is an approximation of $y(t_{j})$ computed by applying a temporal integration method (e.g., the implicit Euler method) to (16). Then we consider the snapshot ensemble

[TABLE]

with $n=n_{t}$ and $\mathsf{r}=\dim\mathscr{V}\leq n$ . In the context of ( $\mathbf{P}^{\ell}$ ) we choose $K=1$ and $n=n_{1}=n_{t}$ . Moreover, $X$ can be either $V$ or $H$ . For the weighting parameters we take the trapezoidal weights

[TABLE]

Of course, other quadrature weights are also possible. Now, instead of ( $\mathbf{P}^{\ell}$ ) we consider the minimization problem

[TABLE]

with either $X=V$ or $X=H$ .

Remark 2.

In [42, Sections 1.2.2 and 1.3.2] a continuous variant of the POD method is considered. In that case the trapezoidal approximation in (24) is replaced by integrals over the time interval $[0,T]$ . More precisely, we consider

[TABLE]

with either $X=V$ or $X=H$ . For the relationship between solutions to (24) and (25) we refer to [58] and [42, Section 1.2.3]. $\Diamond$

To compute the POD basis $\{\Psi_{i}\}_{i=1}^{\ell}$ of rank $\ell$ we have to evaluate the inner products $\langle y_{j},\Psi_{i}\rangle_{X}$ , where either $X=V$ or $X=H$ hold. In typical applications the space $X$ is usually infinite dimensional. Therefore, a discretization of $X$ is required in order to get a POD method that can be realized on a computer. This is the topic of the next subsection.

2.3.3. Galerkin discretization

We discretize the state equation by applying any spatial approximation scheme. Let us consider here a Galerkin scheme for (16). For this reason we are given linearly independent elements $\varphi_{1},\ldots,\varphi_{m}\in V$ and define the $m$ -dimensional subspace

[TABLE]

endowed with the $V$ topology. Then a Galerkin scheme for (16) is given as follows: find $y^{h}\in W(0,T)\cap C([0,T];V^{h})$ satisfying

[TABLE]

Inserting the representation $y^{h}(t)=\sum_{i=1}^{m}\mathrm{y}^{h}_{i}(t)\varphi_{i}\in V^{h}$ , $t\in[0,T]$ , in (26) and choosing $\varphi^{h}=\varphi_{i}$ for $i=1,\ldots,m$ we derive the following $m$ -dimensional initial value problem

[TABLE]

where we have used the matrices and vectors

[TABLE]

In the pseudo code function [ $Y$ ]= StateSol( $\mathrm{y}_{\circ}^{h}$ ) we present a solution method for (27) using the implicit Euler method.

function [ $Y$ ]= StateSol( $\mathrm{y}_{\circ}^{h}$ )

0: Initial condition $\mathrm{y}_{\circ}^{h}$ ;

1: Compute $\mathrm{y}_{1}^{h}\in\mathbb{R}^{m}$ solving $\mathrm{M}^{h}\mathrm{y}_{1}^{h}=\mathrm{y}_{\circ}^{h}$ ;

2: for $j=2$ to $n_{t}$ do

3: Set $\mathrm{A}^{h}_{j}=\mathrm{A}^{h}(t_{j})\in\mathbb{R}^{m\times m}$ and $\mathrm{F}^{h}_{j}=\mathrm{F}^{h}(t_{j})\in\mathbb{R}^{m}$ ;

4: Solve (e.g., by applying Newton’s method) for $\mathrm{y}_{j}^{h}\in\mathbb{R}^{m}$

$\big{(}\mathrm{M}^{h}+\Delta t_{j}\mathrm{A}_{j}^{h}\big{)}\mathrm{y}^{h}_{j}+\Delta t_{j}\mathrm{N}^{h}(\mathrm{y}^{h}_{j})=\mathrm{M}^{h}\mathrm{y}_{j-1}^{h}+\Delta t_{j}\mathrm{F}^{h}_{j};$

5: end for

6: return matrix $Y=[\mathrm{y}_{1}^{h}\,|\ldots|\,\mathrm{y}_{n_{t}}^{h}]\in\mathbb{R}^{m\times n_{t}}$ ;

In the next subsection we discuss how a POD basis $\{\Psi_{j}\}_{j=1}^{\ell}$ of rank $\ell\leq\mathsf{r}$ can be computed from numerical approximations for the solution $y^{h}$ to (27).

2.3.4. POD method for the fully discretized nonlinear evolution problem

Recall that we have introduced the temporal grid $\{t_{j}\}_{j=1}^{n_{t}}\subset[0,T]$ and set $n=n_{t}$ . Let $y_{1}^{h},\ldots,y_{n}^{h}\in V^{h}$ be numerical approximations to the solution $y^{h}(t)$ to (27) at time instances $t=t_{j}$ , $j=1\ldots,n_{t}$ . Then, a coefficient matrix $Y\in\mathbb{R}^{m\times n}$ is defined by the elements $Y_{ij}$ given by

[TABLE]

The $j$ -th column of $Y$ (denoted by $\mathrm{y}_{j}=Y_{\cdot,j}$ ) contains the Galerkin coefficients of the snapshot $y_{j}^{h}\in V^{h}$ . We set $\mathsf{r}=\mathrm{rank}\,Y\leq\min(m,n)$ and

[TABLE]

Due to $\mathscr{V}^{h}\subset V^{h}$ we have $\Psi_{j}\in V^{h}$ for $1\leq j\leq\ell$ . Therefore, there exists a coefficient matrix $\Uppsi\in\mathbb{R}^{m\times\ell}$ that is defined by the elements $\Uppsi_{ij}$ satisfying

[TABLE]

where the $j$ -th column $\Uppsi_{\cdot,j}$ of the matrix $\Uppsi$ consists of the Galerkin coefficients of the element $\Psi_{j}$ . Notice that

[TABLE]

hold for $v^{h}=\sum_{i=1}^{m}\mathrm{v}_{i}^{h}\varphi_{i}$ , $w^{h}=\sum_{i=1}^{m}\mathrm{w}_{i}^{h}\varphi_{i}\in V^{h}$ and for the symmetric, positive definite stiffness matrix

[TABLE]

Then, we have for $X=H$

[TABLE]

and for $X=V$

[TABLE]

Thus, we can apply the approach presented in Section 2.2 choosing $W=\mathrm{M}^{h}$ for $X=H$ and $W=\mathrm{S}^{h}$ for $X=V$ . Moreover, we set $K=1$ , $n_{1}=n_{t}=n$ and $\alpha^{1}_{j}=\alpha_{j}$ defined in (23). Now a POD basis of rank $\ell$ for (27) can be computed by the pseudo code function [ $Y$ , $\Psi$ ]= PODState( $\mathrm{y}_{\circ}^{h}$ , $W$ , $D$ , $\ell$ , flag).

function [ $Y$ , $\Psi$ ]= PODState( $\mathrm{y}_{\circ}^{h}$ , $W$ , $D$ , $\ell$ , flag)

0: Initial condition $\mathrm{y}_{\circ}^{h}$ , weighting matrices $W$ , $D$ , number $\ell$ of POD functions and flag for the solver;

1: Call [ $Y$ ]= StateSol( $\mathrm{y}_{\circ}^{h}$ );

2: Call [ $\Psi$ , $\Lambda$ ]= POD( $Y$ , $W$ , $D$ , $\ell$ , flag);

3: return $Y=[\mathrm{y}_{1}^{h}\,|\ldots|\,\mathrm{y}_{n_{t}}^{h}]$ and $\Psi=[\Psi_{1}\,|\ldots|\,\Psi_{\ell}]$ ;

In the next subsection we will discuss in detail how the POD method has to be applied in that case if we have – instead of $V^{h}$ – different spaces $V^{h_{j}}$ for each $j=1,\ldots,n$ .

2.4. The POD method with snapshots generated by spatially adaptive finite element methods

In practical applications it often is desirable to provide POD models for time-dependent PDE systems, whose numerical treatment requires adaptive numerical techniques in space and/or time. Snapshots generated by those methods are not directly amenable to the POD procedure described in Section 2.3.4, since the application of spatial adaptivity means, that the snapshots at each time instance may have different lengths due to their different spatial resolutions. In fact, there is not one single discrete Galerkin space $V^{h}$ for all snapshots generated by the fully discrete evolution, but at every time instance $t_{j}$ the adaptive procedure generates a discrete Galerkin space $V^{h_{j}}\subset X$ , so that in this case $y_{j}^{h}\equiv y_{j}^{h_{j}}\in V^{h_{j}}$ . For this reason, no snapshot matrix $Y$ can be formed with columns containing the basis coefficient vectors of the snapshots.

To obtain also a POD basis in this situation we inspect the operator $\mathcal{K}$ of (8) and observe that its action can be computed if the inner products $\langle y_{j}^{k},y_{i}^{l}\rangle_{X}$ can be evaluated for all $1\leq i\leq n_{l},1\leq j\leq n_{k}$ and $1\leq k,l\leq K$ .

Let us next demonstrate how to compute a POD basis for snapshots residing in arbitrary finite element (FE) spaces. To begin with we drop the superindex $h$ and set $V_{j}:=V^{h_{j}}$ . For each time instant $j=1,\dotsc,n$ of our time discrete PDE system the snapshots $\{y_{j}\}_{j=1}^{n}$ are taken from different finite element spaces $V_{j}\subseteq X$ $(j=1,\dots,n)$ , where $X$ denotes a common (real) Hilbert space. Let $V_{j}=\mathrm{span}\,\{\varphi_{1}^{j},\dots,\varphi_{m_{j}}^{j}\}$ . Then we have the expansions

[TABLE]

with coefficient vectors

[TABLE]

containing the finite element coefficients. The inner product of the associated functions can thus be computed as

[TABLE]

so that the evaluation of the action $\mathcal{K}\Phi$ only relies on the evaluation of the inner products $\langle\varphi_{k}^{i},\varphi_{l}^{j}\rangle_{X}$ ( $1\leq i,j\leq n$ , $1\leq k\leq m_{i}$ , $1\leq l\leq m_{j}$ ). In other words, once we are able to compute those inner products we are in the position to set up the eigensystem $\{(\lambda_{i},\Phi_{i})\}_{i=1}^{\mathsf{r}}$ of $\mathcal{K}$ from (8). The POD modes $\{\Psi_{i}\}_{i=1}^{\mathsf{r}}$ can then be computed according to (9) by

[TABLE]

Details on this procedure can be found in [62, 39].

To illustrate how this procedure can be implemented we summarize Examples 6.1-6.3 from [39], which deal with nested and non-nested meshes. All coding was done in C++ with using FEniCS [8, 66] for the solution of the differential equations and ALBERTA [87] for dealing with hierarchical meshes. The numerical tests were run on a compute server with 512 GB RAM.

Run 1 ([39, Example 6.1]).

We consider the Example 1 with homogeneous Dirichlet boundary condition and vanishing nonlinearity, i.e. we set $c\equiv 0$ so that the equation becomes linear. The spatial domain is chosen as $\Omega=(0,1)\times(0,1)\subset\mathbb{R}^{2}$ , the time interval is $[0,T]=[0,1.57]$ . Furthermore, we choose $X=L^{2}(\Omega)$ . For the temporal discretization we introduce the uniform time grid by

[TABLE]

with $\Delta t=0.001$ . For the spatial discretization we use $h$ -adapted piecewise linear, continuous finite elements on hierarchical and nested meshes. Snapshots of the analytical solution at three different time points are shown in Figure 1. Details on the construction of the analytical solution and the corresponding right hand side $f$ are given in [39, Example 6.1].

Due to the steep gradients in the neighborhood of the minimum and maximum, respectively, the use of an adaptive finite element discretization is justified. The resulting computational meshes as well as the corresponding finest mesh (reference mesh at the end of the simulation which is the union of all adaptive meshes generated during the simulation) are shown in Figure 2.

The number of nodes of the adaptive meshes varies between 3637 and 7071 points. The finest mesh has 18628 degrees of freedom. A uniform mesh with grid size of order of the diameter of the smallest triangles in the adaptive grids ( $h_{\min}=0.0047$ ) would have 93025 degrees of freedom. This clearly reveals the benefit of using adaptive meshes for snapshot generation which is also well reflected in the comparison of the computational times needed for the snapshot generation on the adaptive mesh taking 944 seconds compared to 8808 seconds on the uniform mesh, see Table 4) for the speedup factors obtained by spatial adaptation. In Figure 3, the resulting normalized eigenspectrum of the correlation matrix $\mathrm{K}$ is shown for snapshots obtained by uniform spatial discretization (“uniform FE mesh”), for snapshots obtained by interpolation on the finest mesh (“adaptive FE mesh”), and for snapshots without interpolation (“infPOD”), where $\mathrm{K}$ is associated to the operator $\mathcal{K}$ from (8), see also (30).

We observe that the eigenvalues for both adaptive approaches coincide. This numerically validates what we expect from theory: the information content which is contained in the matrix $\mathcal{K}$ when we explicitly compute the entries without interpolation is the same as the information content contained within the eigenvalue problem which is formulated when using the finest mesh. No information is added or lost. Moreover, we recognize that about the first 28 eigenvalues computed corresponding to the adaptive simulation coincide with the simulation on a uniform mesh. From index 29 on, the methods deliver different results: for the uniform discretizations, the normalized eigenvalues fall below machine precision at around index 100 and stagnate. In contrary, the normalized eigenvalues for both adaptive approaches flatten in the order around $10^{-10}$ . If the error tolerance for the spatial discretization error is chosen larger (or smaller), the stagnation of the eigenvalues in the adaptive method takes place at a higher (or lower) order (see Figure 3, right). Concerning dynamical systems, the magnitude of the eigenvalue corresponds to the characteristic properties of the underlying dynamical system: the larger the eigenvalue, the more information is contained in the corresponding eigenfunction. Since all adaptive meshes are contained in the uniform mesh, the difference in the amplitude of the eigenvalues is due to the interpolation errors during refinement and coarsening. This is the price we have to pay for faster snapshot generation using adaptive methods. A further aspect gained from the decay behavior of the eigenvalues in the adaptive case is the following; the adaptive approach filters out the noise in the system which is related to the modes corresponding to the singular values that are not matched by the eigenvalues of the adaptive approach. This in the language of frequencies means that the overtones in the systems which get lost in the adaptive computations live in the space which is neglected by the POD method based on adaptive finite element snapshots. From this point of view, adaptivity can be interpreted as a smoother.

The first, second and fifth POD modes of Run 1 obtained by the adaptive approach are depicted in Figure 4. We observe the classical appearance of the basis functions. The initial condition is reflected by the first POD basis function. The next basis functions admit a number of minima and maxima corresponding to the index in the basis: $\Psi_{2}$ has two minima and two maxima etc. This behavior is similar to the increasing oscillations in higher frequencies in trigonometric approximations. The POD basis functions corresponding to the uniform spatial discretization have a similar appearance. $\Diamond$

Run 2 ([39, Example 6.2]).

(Cahn-Hilliard system.) We consider Example 2 in the form (18) with $\Omega=(0,1.5)\times(0,0.75)$ , $T=0.025$ , constant mobility $\mathsf{m}\equiv 0.00002$ , and constant surface tension $\sigma\equiv 24.5$ . The interface parameter $\varepsilon$ is set to $\varepsilon=0.02$ , with resulting interface thickness $\pi\cdot\varepsilon\approx 0.0628$ . We use the relaxed double obstacle free energy $W_{s}^{\text{rel}}$ from (20) with $s=10^{4}$ . As initial condition, we choose a circle with radius $r=0.25$ and center $(0.375,0.375)$ . The initial condition is transported horizontally with constant velocity $v=(30,0)^{T}$ . We set

[TABLE]

so that $\Delta t=2.5\cdot 10^{-5}$ . The numerical computations are performed with the semi-implicit Euler scheme. For this purpose let $c^{j-1}\in V$ and $c^{j}\in V$ denote the time-discrete solution at $t_{j-1}$ and $t_{j}$ , respectively. Based on the variational formulation (22) we tackle the time discrete version of (18) in the form: given $c^{j-1}$ , find $c^{j}$ , $w^{j}$ solving

[TABLE]

for all $\varphi_{1},\varphi_{2}\in V$ and $j=2,\ldots,n_{t}$ with $c^{1}=c_{\circ}$ . According to (22), here it is $V=\{v\in H^{1}(\Omega),\frac{1}{|\Omega|}\int_{\Omega}vdx=0\}$ . Note that the free energy function $W$ is split into a convex part $W_{+}$ and a concave part $W_{-}$ , such that $W=W_{+}+W_{-}$ and $W^{\prime}_{+}$ is treated implicitly, whereas $W^{\prime}_{-}$ is treated explicitly with respect to time. This leads to an unconditionally energy stable time marching scheme, compare [33]. The system (29) is discretized in space using piecewise linear and continuous finite elements. The resulting nonlinear equation systems are solved using a semi-smooth Newton method.

Figure 5 shows the phase field (left) and the chemical potential (right) for the finite element simulation using adaptive meshes. The initial condition $c_{\circ}$ is transported horizontally with constant velocity.

The adaptive finite element meshes as well as the finest mesh which is generated during the adaptive finite element simulation are shown in Figure 6. The number of degrees of freedom in the adaptive meshes varies between 6113 and 8795. The finest mesh (overlay of all adaptive meshes) has 54108 degrees of freedom, whereas a uniform mesh with discretization fineness as small as the smallest triangle in the adaptive meshes has 88450 degrees of freedom.

Figure 7 shows the first, second and fifth POD mode for the phase field $c$ and the chemical potential $w$ . Analogously to Run 1, we observe a periodicity in the POD basis functions corresponding to their basis index numbers.

In the present example we only compare the POD procedure for two kinds of snapshot discretizations, namely the adaptive approach with using a finest mesh, and the uniform mesh approach, where the gridsize is chosen to be of the same size as the smallest triangle in the adaptive meshes. We choose $X=L^{2}(\Omega)$ and compute a separate POD basis for each of the variables $c$ and $w$ .

In Figure 8, a comparison is visualized concerning the normalized eigenspectrum for the phase field $c$ and the chemical potential $w$ using uniform and adaptive finite element discretization. We note for the phase field $c$ that about the first 180 eigenvalues computed corresponding to the adaptive simulation coincide with the eigenvalues of the simulation on the finest mesh. Then, the eigenvalues corresponding to the uniform simulation decay faster. Similar observations apply for the chemical potential $w$ .

We use the criterion (11) to determine the basis length $\ell$ which is required to represent a prescribed information content with the respective POD space. We will choose the POD basis length $\ell_{c}$ for the phase field $c$ and the number of POD modes $\ell_{w}$ for the chemical potential, such that

[TABLE]

for a given value $p$ representing the loss of information. Alternatively, the POD basis length could be chosen in alignment with the POD projection error (10) with the expected spatial and/or temporal discretization error, compare e.g. [39, Theorem 5.1]. Let us also refer, e.g., to the recent paper [12], where different adaptive POD basis extension techniques are discussed. Table 1 summarizes how to choose $\ell_{c}$ and $\ell_{w}$ in order to capture a desired amount of information. Moreover, it tabulates the POD projection error (10) depending on the POD basis length, where $\lambda_{i}^{c}$ and $\lambda_{i}^{w}$ denote the eigenvalues for the phase field $c$ and the chemical potential $w$ , respectively. The results in Table 1 agree with our expectations: the smaller the loss of information $p$ is, the more POD modes are needed and the smaller is the POD projection error. $\Diamond$

Run 3 ([39, Example 6.3]).

(Linear heat equation revisited). We again consider Example 1 with $c\equiv 0$ . The purpose of this example is to confirm that our POD approach also is applicable in the case of non-nested meshes like it appears in the case of $r$ -adaptivity, for example. We set up the matrix $\mathrm{K}$ for snapshots generated on sequences of non-nested spatial discretizations. This requires the integration over cut elements, see [39]. We choose $\Omega=(0,1)\times(0,1)\subset\mathbb{R}^{2}$ , $[0,T]=[0,1]$ , and we apply a uniform temporal discretization with time step size $\Delta t=0.01$ . The analytical solution in the present example is given by

[TABLE]

with ${\bm{x}}=(x_{0},x_{1})$ , source term $f:=y_{t}-\Delta y$ and the initial condition $g:=y(0,\cdot)$ . The initial condition is discretized using piecewise linear and continuous finite elements on a uniform spatial mesh which is shown in Figure 9 (left). Then, at each time step, the mesh is disturbed by relocating each mesh node according to the assignment

[TABLE]

where $\theta\in\mathbb{R}_{+}$ is sufficiently small such that all coordinates of the interior nodes fulfill $0<x_{0}<1$ and $0<x_{1}<1$ . After relocating the mesh nodes, the heat equation is solved on this mesh for the next time instance. We use Lagrange interpolation to transfer the finite element solution of the previous time step onto the new mesh. The disturbed meshes at $t=0.5$ and $t=1.0$ as well as an overlap of two meshes are shown in Figure 9. To compute the matrix $\mathrm{K}$ from (30) we have to evaluate the corresponding inner products of the snapshots, where we need to integrate over cut elements.

We compute the eigenvalue decomposition of the matrix representation $\mathrm{K}$ of the operator $\mathcal{K}$ (cf. (30)) for different values of $\theta$ and compare the results with a uniform mesh (i.e. $\theta=0$ ) in Figure 10. We note that the eigenvalues of the disturbed mesh are converging to the eigenvalues of the uniform mesh for $\theta\to 0$ . As expected, the eigenvalue spectrum depends only weakly on the underlying mesh given that the mesh size is sufficiently small.

Concerning the computational complexity of POD with non-nested meshes let us note that solving the heat equation takes 2.1 seconds on the disturbed meshes and 1.8 seconds on the uniform mesh. The computational time needed to compute each entry of the matrix $\mathrm{K}$ is 0.022 seconds and computing the eigenvalue decomposition for $\mathrm{K}$ takes 0.0056 seconds. Note that the cut element integration problem for each matrix entry takes a fraction of time required to solve the finite element problem. $\Diamond$

3. The POD Galerkin procedure

Once the POD basis is generated it can be used to set up a POD-Galerkin approximation of the original dynamical system. This is discussed in the present section. In this context we recall that the space spanned by the POD basis is used with a Galerkin method to approximate the original system for e.g. other inputs and/or parameters than those used to generate the snapshots for constructing the POD basis. A typical application is given by PDE-constrained optimization, where the PDE system during the optimization is substituted by POD Galerkin surrogates, see Section 6 for more details.

3.1. The POD Galerkin procedure

Suppose that for given snapshots $y_{j}^{h}\in V^{h_{j}}\subset X$ , $1\leq j\leq n$ , we have computed the symmetric matrix

[TABLE]

associated to the operator $\mathcal{K}$ from (8) together with its eigensystem. Its $\ell\in\{1,\ldots,\mathsf{r}\}$ largest eigenvalues are $\{\lambda_{i}\}_{i=1}^{\ell}$ with corresponding eigenvectors $\{\Phi_{i}\}_{i=1}^{\ell}\subset\mathbb{R}^{n}$ . The POD basis $\{\Psi_{i}\}_{i=1}^{\ell}$ is then given by (9), i.e.,

[TABLE]

This POD basis is utilized in order to compute a reduced-order model for (16) along the lines of Section 2.3.3, where the space $V^{h}$ is replaced by the space $V^{\ell}=\mathrm{span}\,\{\Psi_{1},\dots,\Psi_{\ell}\}\subset V$ . More precisely we make the POD Galerkin ansatz

[TABLE]

as an approximation for $y(t)$ , with the Fourier coefficients

[TABLE]

Inserting $y^{\ell}$ into (16) and choosing $V^{\ell}\subset V$ as the test space leads to the system

[TABLE]

for all $\Psi\in V^{\ell}$ and for almost all $t\in(0,T]$ . The system (32) is called POD reduced-order model (POD-ROM). Using the ansatz (31), we can write (32) as an $\ell$ -dimensional ordinary differential equation system for the POD mode coefficients $\eta(t)=(\eta_{i}(t))_{1\leq i\leq\ell}$ , $t\in(0,T]$ , as follows:

[TABLE]

for $i=1,\ldots,\ell$ . Note that $\langle\Psi_{i},\Psi_{j}\rangle_{H}=\delta_{ij}$ if we choose $X=H$ in the context of Section 2.3. In a next step we rewrite this system using relation between $\Psi_{i}$ and $\Phi_{i}$ given in (9). This leads to

[TABLE]

for $i=1,\ldots,\ell$ . In order to write (34) in a compact matrix-vector form, let us introduce the diagonal matrix

[TABLE]

From the first $\ell$ eigenvectors $\{\Phi_{i}\}_{i=1}^{\ell}$ of $\mathrm{K}$ we build the matrix

[TABLE]

Then, the system (34) can be written as the system

[TABLE]

for the vector-valued mapping $\eta=(\eta_{1},\ldots,\eta_{\ell})^{\top}:[0,T]\to\mathbb{R}^{\ell}$ , for the nonlinearity $\mathrm{N}^{\ell}=(\mathrm{N}_{i}^{\ell}(\cdot))_{1\leq i\leq\ell}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell}$ with

[TABLE]

and for the stiffness matrix $\mathrm{A}^{\ell}=((\mathrm{A}^{\ell}_{ij}))\in\mathbb{R}^{\ell\times\ell}$ given as

[TABLE]

Note that the right hand side $\mathrm{F}^{\ell}(t)=(\mathrm{F}_{i}^{\ell}(t))_{1\leq i\leq\ell}$ and the initial condition $\eta_{\circ}=(\eta_{\circ i})_{1\leq i\leq\ell}$ are given by

[TABLE]

and

[TABLE]

for $i=1,\ldots,\ell$ , respectively. Their calculation can be done explicitly for any arbitrary finite element discretization. For a given function $w\in V$ (for example $w=f(t)$ or $w=y_{\circ}$ ) with finite element discretization $w=\sum_{i=1}^{m_{w}}\mathrm{w}_{i}\chi_{i}$ , nodal basis $\{\chi_{i}\}_{i=1}^{m_{w}}\subset V$ and appropriate mode coefficients $\{\mathrm{w}_{i}\}_{i=1}^{m_{w}}$ we can compute

[TABLE]

for $j=1,\ldots,n$ where $y_{j}^{h}=\sum_{k=1}^{m_{j}}\mathrm{y}_{k}^{j}\varphi_{k}^{j}\in V^{h_{j}}$ denotes the $j$ -th snapshot. Again, for any $i=1,\ldots,m_{w}$ and $k=1,\ldots,m_{j}$ , the computation of the inner product $\langle\chi_{i},\varphi_{k}^{j}\rangle_{X}$ can be done explicitly.

Obviously, for linear evolution equations the POD reduced-order model (35) can be set up and solved using snapshots with arbitrary finite element discretizations. The computation of the nonlinear component $\mathrm{N}^{\ell}(\eta(t))$ needs particular attention. In Section 3.3 we discuss the options to treat the nonlinearity.

3.2. Time-discrete reduced-order model

In order to solve the reduced-order system (32) numerically, we apply the implicit Euler method for time discretization and use for simplicity the same temporal grid $\{t_{j}\}_{j=1}^{n}$ as for the snapshots. It is also possible to use a different time grid, cf. [58]. The time-discrete reduced-order model reads

[TABLE]

for all $\Psi\in V^{\ell}$ and $j=2,\ldots,n$ . Equivalently the following system holds for the coefficient vector $\eta(t)\in\mathbb{R}^{\ell}$ (cf. (35)):

[TABLE]

with the inhomogeneity $\mathrm{F}^{\ell}_{j}=(\mathrm{F}_{ji}^{\ell})_{1\leq i\leq\ell}$ , $j=2,\ldots,n$ , given as

[TABLE]

3.3. Discussion of the computation of the nonlinear term

Let us now consider the computation of the nonlinear term $\Uplambda\mathrm{N}^{\ell}(\eta^{j})\in\mathbb{R}^{\ell}$ of the POD-ROM (35). It holds true

[TABLE]

for $k=1,\ldots,\ell$ . It is well known that the evaluation of nonlinearities in the reduced-order modeling context is computationally expensive. To make this clear, let us assume, we are given a uniform finite element discretization with $m$ degrees of freedom. Then, in the fully discrete setting, the nonlinear term has the form

[TABLE]

where $\Uppsi=[\Psi_{1}\,|\ldots|\,\Psi_{\ell}]\in\mathbb{R}^{m\times\ell}$ is the matrix in which the POD modes are stored columnwise and $W\in\mathbb{R}^{m\times m}$ is a weighting matrix related to the utilized inner product (cf. (3)). Hence, the treatment of the nonlinearity requires the expansion of $\Uppsi\eta(t)\in\mathbb{R}^{m}$ in the full space for $t\in[0,T]$ a.e. Then the nonlinearity can be evaluated and finally the result is projected back to the POD space. Obviously, this means that the reduced-order model is not fully independent of the high-order dimension $m$ and efficient simulation cannot be guaranteed. Therefore, it is convenient to seek for hyper reduction, i.e., for a treatment of the nonlinearity, where the model evaluation cost is related to the low dimension $\ell$ . Common choices are empirical interpolation methods like, e.g., EIM ([14]), DEIM ([24]), and QDEIM ([31]). Another option is dynamic mode decomposition for nonlinear model order reduction, see e.g. [7]. Furthermore, in [99] nonlinear model reduction is realized by replacing the nonlinear term by its interpolation in the finite element space. An alternative approach for the treatment of the nonlinearity is missing point estimation [10], or best points interpolation [70].

Most of these methods need a common reference mesh for the computations. To overcome this restriction we propose different paths which allow for more general discrete settings like $r$ -adaptivity discussed in Run 3.

One option is to use EIM [14]. Alternatively, we can linearize and project the nonlinearity onto the POD space. For this approach, let us consider the linear reduced-order system for a fixed given state $\bar{y}$ , which takes the form

[TABLE]

for all $\Psi\in V^{\ell}$ and for almost all $t\in(0,T]$ . The linear evolution problem (38) can be set up and solved explicitly without spatial interpolation. In the numerical examples in Section 6, we take the finite element solution as given state in each time step, i.e., $\bar{y}(t_{j})=y_{j}$ for $j=2,\ldots,n$ .

Furthermore, the linearization of the reduced-order model (32) can be considered:

[TABLE]

for all $\Psi\in V^{\ell}$ and for almost all $t\in(0,T]$ , where $\mathcal{N}^{\prime}$ denotes the Fréchet derivative of the nonlinear operator $\mathcal{N}$ . This linearized problem is of interest e.g. in the context of optimal control, where it occurs in each iteration level within sequential quadratic programming (SQP) methods; see [49], for example. Choosing the finite element solution as given state in each time instance and using (9) leads to

[TABLE]

for $j=2,\ldots,n$ and $i=1,\ldots,\ell$ . Finally, we approximate the nonlinearity $\Uplambda\mathrm{N}^{\ell}(\eta^{j})\in\mathbb{R}^{\ell}$ in (37) by

[TABLE]

for $j=2,\ldots,n$ and $i=1,\ldots,\ell$ , which can be written as

[TABLE]

where

[TABLE]

and with $\tilde{y}_{j}=\sqrt{\alpha_{j}}y_{j}$ , $j=1,\ldots,n$ ,

[TABLE]

For weakly nonlinear systems this approximation may be sufficient, depending on the problem and its goal. A great advantage of linearizing the semilinear partial differential equation is that only linear equations need to be solved which leads to a further speedup, see Table 6. However, if a more precise approximation is desired or necessary, we can think of approximations including higher order terms, like quadratic approximation, see, e.g., [25] and [84], or Taylor expansions, see, e.g., [73, 74] and [35]. Nevertheless, the efficiency of higher order approximations is limited due to growing memory and computational costs.

3.4. Expressing the POD solution in the full spatial domain

Having determined the solution $\eta(t)$ to (35), we can set up the reduced solution $y^{\ell}(t)$ in a continuous framework:

[TABLE]

Now, let us turn to the fully discrete formulation of (40). For a time-discrete setting, we introduce for simplicity the same temporal grid $\{t_{j}\}_{j=1}^{n}$ as for the snapshots. The snapshots (28) admit the expansion

[TABLE]

Let $\{Q_{r}^{j}\}_{r=1}^{l_{j}}$ denote an arbitrary set of grid points for the reduced system at time level $t_{j}$ . The fully discrete POD solution can be computed by evaluation:

[TABLE]

for $r=1,\ldots,l_{j}$ and $j=1,\ldots,n$ . This allows us to use any grid for expressing the POD solution in the full spatial domain. For example, we can use the same nodes at time level $j$ for the POD simulation as we have used for the snapshots, i.e., for $j=1,\ldots,n$ it holds $l_{j}=m_{j}$ and $Q_{r}^{j}=P_{k}^{j}$ for all $r,k=1,\ldots,m_{j}$ . Another option can be to choose

[TABLE]

i.e., the common finest grid. Obviously, a special and probably the easiest case concerning the implementation is to choose snapshots which are expressed with respect to the same finite element basis functions and utilize the common finest grid for the simulation of the reduced-order system, which is proposed by [95]. After expressing the adaptively sampled snapshots with respect to a common finite element space, the subsequent steps coincide with the common approach of taking snapshots which are generated without adaptivity. Then, expression (41) simplifies to

[TABLE]

where $\{P_{r}\}_{r=1}^{m}$ are the nodes of the common finite element space.

Run 4 ([39, Example 6.1]).

Let us revisit Run 1 and consider its POD Galerkin solutions. The POD solutions for $\ell=10$ and $\ell=50$ POD basis functions using spatial adaptive snapshots which are interpolated onto the finest mesh are shown in Figure 11. As expected, the more POD basis functions we use (until stagnation of the corresponding eigenvalues), the less oscillations appear in the POD solution and the better the approximation is.

Table 2 compares the approximation quality in the relative $L^{2}(0,T;L^{2}(\Omega))$ -norm of the POD solution using adaptively generated snapshots which are interpolated onto the finest mesh with snapshots of uniform spatial discretization depending on different POD basis lengths. Then, for $\ell=20$ we obtain a relative $L^{2}(0,T;L^{2}(\Omega))$ -error between the POD solution and the finite element solution of size $\varepsilon_{\text{FE}}^{\text{ad}}=3.08\cdot 10^{-2}$ , and a relative $L^{2}(0,T;L^{2}(\Omega))$ -error between the POD solution and the true solution of size $\varepsilon_{\text{true}}^{\text{ad}}=2.17\cdot 10^{-2}$ .

We note that $\varepsilon_{\text{FE}}^{\text{uni}}$ decays down to $10^{-8}$ ( $\ell=100$ ) and then stagnates if using a uniform mesh. This behavior is clear, since the more POD basis elements we include (up to stagnation of the corresponding eigenvalues), the better is the POD solution an approximation for the finite element solution. On the other hand, both $\varepsilon_{\text{true}}^{\text{uni}}$ and $\varepsilon_{\text{true}}^{\text{ad}}$ start to stagnate after $\ell=30$ in Table 2, columns 4 and 5. This is due to the fact that at this point the spatial (and temporal) discretization error dominates the modal error. This is in accordance with the decay of the eigenvalues shown in Figure 3 and is accounted for e.g. in the error estimation presented in [39, Theorem 5.1]. Similar observations hold true for the relative $L^{2}(0,T;H^{1}(\Omega))$ -error listed in Table 3 with the difference that the $L^{2}(0,T;H^{1}(\Omega))$ -error is larger than the respective $L^{2}(0,T;L^{2}(\Omega))$ -error.

The computational times for the full and the low order simulation using uniform finite element discretizations and adaptive finite element snapshots, which are interpolated onto the finest mesh, respectively, are listed in Table 4.

Once the POD basis is computed in the offline phase, the POD simulation corresponding to adaptive snapshots is 13485 times faster than the FE simulation using adaptive finite element meshes. This speedup factor is important when one considers e.g. optimal control problems with time-dependent PDEs, where the POD-ROM can be used as surrogate model in repeated solution of the underlying PDE model. In the POD offline phase, the most expensive task is to express the snapshots with respect to the common finite element space, which takes 226 seconds. Since $\mathrm{K}$ (30) is symmetric, it suffices to calculate the entries on and above the diagonal, which are $\sum_{k=1}^{n}k=(n^{2}+n)/2$ entries. Thus, the computation of each entry in the correlation matrix $\mathrm{K}$ using a common finite element space takes around 0.00018 seconds. We note that in the approach explained in Sections 2.4 and 3, the computation of the matrix $\mathrm{K}$ is expensive. For each entry the calculation time is around 0.03 seconds, which leads to a computation time of around 36997 seconds for the matrix $\mathrm{K}$ . The same effort is needed to build $\mathrm{A}^{\ell}=a(\mathcal{Y}\Phi_{j},\mathcal{Y}\Phi_{i})$ . In this case, the offline phase takes therefore around 88271 seconds. For this reason, the approach to interpolate the adaptively generated snapshots onto the finest mesh is computationally more favorable. But since the computation of $\mathrm{K}$ can be parallelized, the offline computation time can be reduced provided that the appropriate hardware is available. $\Diamond$

Run 5 (Cahn-Hilliard equations).

Now let us revisit Run 2, where we in the following run the numerical simulations for different combinations of numbers for $\ell_{c}$ and $\ell_{w}$ of Table 1. The approximation quality of the POD solution using adaptive meshes is compared to the use of a uniform mesh in Table 5. As expected, Table 5 shows that the error between the POD surrogate solution and the high-fidelity solution gets smaller for an increasing number of utilized POD basis functions. Moreover, a larger number of POD modes is needed for the chemical potential $w$ than for the phase field $c$ in order to get an error in the same order which is in accordance to the fact that the decay of the eigenvalues for $w$ is slower than for $c$ as seen in Figure 8.

We now discuss the treatment of the nonlinearity and also investigate the influence of non-smoothness of the model equations to the POD procedure. Using the convex-concave splitting for $W$ , we obtain for the Moreau-Yosida relaxed double obstacle free energy the concave part $W_{-}^{\text{rel}}(c)=\frac{1}{2}(1-c^{2})$ and the convex part $W_{+}^{\text{rel}}(c)=\frac{s}{2}(\max(c-1,0)^{2}+\min(c+1,0)^{2})$ . This means that the first derivative of the concave part is linear with respect to the phase field variable $c$ . The challenging part is the convex term with non-smooth first derivative. For a comparison, we consider the smooth polynomial free energy with concave part $W_{-}^{p}(c)=\frac{1}{4}(1-2c^{2})$ and convex part $W_{+}^{p}(c)=\frac{1}{4}c^{4}$ .

Figure 12 shows the decay of the normalized eigenspectrum for the phase field $c$ (left) and the first derivative of the convex part $W^{\prime}_{+}(c)$ (right) for the polynomial and the relaxed double obstacle free energy. Obviously, in the non-smooth case more POD modes are needed for a good approximation than in the smooth case. This behavior is similar to the decay of the Fourier coefficients in the context of trigonometric approximation, where the decay of the Fourier coefficients depends on the smoothness of the approximated object.

Table 6 summarizes computational times for different finite element runs as well as reduced-order simulations using the polynomial and the relaxed double obstacle free energy, respectively. In addition, the approximation quality is compared. The computational times are rounded averages from various test runs. It turns out that the finite element simulation (row 1) using the smooth potential is around two times faster than using the non-smooth potential. This is due to the fact that in the smooth case, two to three Newton steps are needed for convergence in each time step, whereas in the non-smooth case six to eight iterations are needed in the semismooth Newton method.

Using the smooth polynomial free energy, the reduced-order simulation is 8-9 times faster than the finite element simulation, whereas using the relaxed double obstacle free energy only delivers a very small speedup. The inclusion of DEIM (we use $\ell_{\text{deim}}=\ell_{c}$ ) in the reduced-order model leads to immense speedup factors for both free energy functions (row 8). This is due to the fact that the evaluation of the nonlinearity in the reduced-order model is still dependent on the full spatial dimension and hyper reduction methods are necessary for useful speedup factors. Note that the speedup factors are of particular interest in the context of optimal control problems. At the same time, the relative $L^{2}(0,T;L^{2}(\Omega))$ -error between the finite element solution and the ROM-DEIM solution is close to the quality of the reduced-order model solution (row 10-11).

However, in the case of the non-smooth free energy function using $\ell_{c}=19$ POD modes for the phase field and $\ell_{w}=26$ POD modes for the chemical potential, the inclusion of DEIM has the effect that the semismooth Newton method does not converge. For this reason, we treat the nonlinearity by applying the technique explained in Section 3.1, i.e. we project the finite element snapshots for $W^{\prime}_{+}(c)$ (which are interpolated onto the finest mesh) onto the POD space. Since this leads to linear systems, the computational times are very small (row 6). The error between the finite element solution and the reduced-order solution using projection of the nonlinearity is of the magnitude $10^{-02}/10^{-03}$ . Depending on the motivation, this approximation quality might be sufficient. Nevertheless, we note that for large numbers of POD modes, using the projection of the nonlinearity onto the POD space leads to a large increase of the error. $\Diamond$

To summarize, a POD reduced-order model construction approach is proposed which can be set up and solved for snapshots originating from arbitrary FE (and also other) spaces. The method is applicable for $h$ -, $p$ - and $r$ -adaptive finite elements. It is motivated from an infinite-dimensional perspective. Using the method of snapshots we are able to set up the correlation matrix $K$ from (30) by evaluating the inner products of snapshots which live in different FE spaces. For non-nested meshes, this requires the detection of cell collision and integration over cut finite elements. A numerical strategy how to implement this practically is elaborated and numerically tested. Using the eigenvalues and eigenvectors of this correlation matrix, we are able to set up and solve a POD surrogate model that does not need the expression of the snapshots with respect to the basis of a common FE space or the interpolation onto a common reference mesh. Moreover, an error bound for the error between the true solution and the solution to the POD-ROM using spatially adapted snapshots is available in [39, Theorem 5.1]. The numerical tests show that the POD projection error decreases if the number of utilized POD basis functions is increased. However, the error between the POD solution and the true solution stagnates when the spatial discretization error dominates. Moreover, the numerics show that using the correlation matrix calculated explicitly without interpolation in order to build a POD-ROM gives the same results as the approach where the snapshots are interpolated onto the finest mesh. From a computational point of view, sufficient hardware should be available in order to compute the correlation matrix in parallel and make the offline computational time competitive. For semilinear evolution problems, the nonlinearity is treated by linearization. This is of interest in view of optimal control problems, in which a linearized state equation has to be solved in each SQP iteration level. An appropriate treatment of the nonlinearity in our applications gains significant speedup of the ROM in computational times when compared to the full simulations. This makes POD-MOR with adaptive finite elements an ideal approach for the construction of surrogate models in e.g. optimal control with nonlinear PDE systems as they arise e.g. in the context of multi-phase flow control problems.

4. Certification with a priori and a posteriori error estimates

As we have seen in Section 3 POD provides a method for deriving low order models of dynamical systems. It can be thought of as a Galerkin approximation in the spatial variable, built from functions corresponding to the solution of the physical system at prespecified time instances. After carrying out a singular value decomposition the leading $\ell$ generalized eigenfunctions are chosen as the POD basis $\{\Psi_{j}\}_{j=1}^{\ell}$ of rank $\ell$ . As soon as one uses POD, questions concerning the quality of the approximation properties, convergence, and rate of convergence become relevant. Let us refer, e.g., to the literature [22, 42, 56, 58, 57, 85, 88, 89, 80] for a priori error analysis for POD Galerkin approximations. It turns out that the error depends on the decay of the sum $\sum_{i>\ell}\lambda_{i}$ , the error $\Delta t^{\beta}$ (with an appropriate $\beta\geq 1$ ) due to the used time integration method, the used Galerkin spaces $\{V^{h_{j}}\}_{j=1}^{n}$ and the choice $X=H$ or $X=V$ . In particular, best approximation properties hold provided the time differences $\dot{y}^{h}(t_{j})$ (or the finite difference discretizations) are included in the snapshot ensembles; cf. [56, 58, 89].

Let us recall numerical test examples from [42, Section 1.5]. The programs are written in Matlab using the Partial Differential Equation Toolbox for the computation of the piecewise linear FE discretization. For the temporal integration the implicit Euler method is applied based on the equidistant time grid $t_{j}=(j-1)\Delta t$ , $j=1,\ldots,n$ and $\Delta t=T/(n-1)$ .

Run 6 (POD for the heat equation; cf. [42, Run 1]).

We choose the final time $T=3$ , the spatial domain $\Omega=(0,2)\subset\mathbb{R}$ , the Hilbert spaces $H=L^{2}(\Omega)$ , $V=H^{1}_{0}(\Omega)$ , the source term $f(t,{\bm{x}})=t^{3}-{\bm{x}}^{2}$ for $(t,{\bm{x}})\in Q=(0,T)\times\Omega$ and the discontinuous initial value $y_{\circ}({\bm{x}})=\chi_{(0.5,1.0)}-\chi_{(1,1.5)}$ for ${\bm{x}}\in\Omega$ , where, e.g., $\chi_{(0.5,1)}$ denotes the characteristic function on the subdomain $(0.5,1)\subset\Omega$ , $\chi_{(0.5,1)}({\bm{x}})=1$ for ${\bm{x}}\in(0.5,1)$ and $\chi_{(0.5,1)}({\bm{x}})=0$ otherwise. We consider a discretization of the linear heat equation (compare (17) with $c\equiv 0$ )

[TABLE]

To obtain an accurate approximation of the exact solution we choose $n=4000$ so that $\Delta t\approx 7.5\cdot 10^{-4}$ holds. For the FE discretization we choose $m=500$ spatial grid points and the equidistant mesh size $h=2/(m+1)\approx 4\cdot 10^{-3}$ . Thus, the FE error – measured in the $H$ -norm – is of the order $10^{-4}$ . In the left graphic of Figure 13, the FE solution $y^{h}$ to the state equation (43) is visualized.

To compute a POD basis $\{\Psi_{i}\}_{i=1}^{\ell}$ of rank $\ell$ we utilize the multiple discrete snapshots $y^{1}_{j}=y^{h}(t_{j})$ for $1\leq j\leq n_{t}$ as well $y^{2}_{1}=0$ and $y^{2}_{j}=(y^{h}(t_{j})-y^{h}(t_{j-1})/\Delta t$ , $j=2,\ldots,n_{t}$ , i.e., we include the temporal difference quotients in the snapshot ensemble and $K=2$ , $n_{1}=n_{2}=n_{t}$ . We choose $X=H$ and utilize the (stable) SVD to determine the POD basis of rank $\ell$ ; compare Section 2.2. We address this issue in a more detail in Run 9. Since the snapshots are FE functions, the POD basis elements are also FE functions. In the right plot of Figure 13, the projection and reduced-order error given by

[TABLE]

are plotted for different POD basis ranks $\ell$ . The chosen trapezoidal weights $\alpha_{j}$ have been introduced in (23). We observe that both errors decay rapidly and coincide until the accuracy $10^{-12}$ , which is already significant smaller than the FE discretization error. These numerical results reflect the a priori error estimates presented in [42, Theorem 1.29]. $\Diamond$

Run 7 (POD for a convection dominated heat equation; cf. [42, Run 2]).

Now we consider a more challenging example. We study a convection-reaction-diffusion equation with a source term which is close to being singular: Let $T$ , $\Omega$ , $y_{\circ}$ , $H$ and $V$ be given as in Run 6. The parabolic problem reads as follows

[TABLE]

We choose the diffusivity $c=0.025$ , the velocity $\beta=1.0$ that determines the speed in which the initial profile $y_{\circ}$ is shifted to the boundary and the reaction rate $a=-0.001$ . Finally, $f(t,{\bm{x}})=\mathbb{P}(\frac{1}{1-t})\cos(\pi{\bm{x}})$ for $(t,{\bm{x}})\in Q$ , where $(\mathbb{P}z)(t)=\min(+l,\max(-l,z(t)))$ restricts the image of $z$ on a bounded interval. In this situation, the state solution $y$ develops a jump at $t=1$ for $l\to\infty$ ; see the left plot of Figure 14.

The right plot of Figure 14 demonstrates that in this case, the decay of the reconstruction residuals and the decay of the errors are much slower as in the right plot of Figure 13. The manifold dynamics of the state solution require an inconvenient large number of POD basis elements. Since the supports of these ansatz functions in general cover the whole domain $\Omega$ , the corresponding system matrices of the reduced model are not sparse. This is different for the matrices arising in the FE Galerkin framework. Model order reduction is not effective for this example if a good accuracy of the solution function $y^{\ell}$ is required. Strategies to improve the accuracy and robustness of the POD-ROM in those situations are discussed in e.g. [18, 100] $\Diamond$

Run 8 (True and exact approximation error; cf. [42, Run 3]).

We consider the setting introduced in Run 6 again. The exact solution to (43) does not possess a representation by elementary functions. Hence, the presented reconstruction and reduction errors actually are the residuals with respect to a high-order FE solution $y^{h}$ . To compute an approximation $y$ of the exact solution $y_{\mathsf{ex}}$ we apply a Crank-Nicolson method (with Rannacher smoothing [77]) ensuring $\|y-y_{\mathsf{ex}}\|_{L^{2}(0,T;H)}=\mathcal{O}(\Delta t^{2}+h^{2})\approx 10^{-5}$ . In the context of model reduction, such a state is sometimes called the “true” solution. To compute the FE state $y^{h}$ we apply the Euler method. In the left plot of Figure 15 we compare the true solution $y_{\mathsf{ex}}$ with the associated POD approximation for different values $n_{t}\in\{64,128,256,...,8192\}$ of the time integration and for the spatial mesh size $h=4\cdot 10^{-3}$ .

For the norm we apply a discrete $L^{2}(0,T;H)$ -norm as in Run 6. Let us mention that we compute for every $n_{t}$ a corresponding FE solution $y^{h}$ . We observe that the residuals ignore the errors arising by the application of time and space discretization schemes for the full-order model. The errors decay below the discretization error $10^{-5}$ . If these discretization errors are taken into account, the residuals stagnate at the level of the full-order model accuracy instead of decaying to zero; cf. right plot of Figure 15. Due to the implicit Euler method we have $\|y^{h}-y_{\mathsf{ex}}\|_{L^{2}(0,T;H)}=\mathcal{O}(\Delta t+h^{2})$ with the mesh-size $h=4\cdot 10^{-3}$ . In particular, from $n_{t}\in\{64,128,256,...,8192\}$ it follows that $\Delta t>3\cdot 10^{-4}>h^{2}=1.6\cdot 10^{-5}$ . Therefore, the spatial error is dominated by the time error for all values of $n_{t}$ . We can observe that the exact residuals do not decay below a limit of the order $\Delta t$ . One can observe that for fixed POD basis rank $\ell$ , the residuals with respect to the true solution increase if the high-order accuracy is improved by enlarging $n_{t}$ , since the reduced-order model has to approximate a more complex system in this case, where the residuals with respect to the exact solution decrease due to the lower limit of stagnation $\Delta t=3/(n_{t}-1)$ . $\Diamond$

Run 9 (Different strategies for a POD basis computation; cf. [42, Run 4]).

As we have explained in Section 2.2, let $Y\in\mathbb{R}^{m\times n}$ denote the matrix of snapshots with rank $\mathsf{r}$ , $W\in\mathbb{R}^{m\times m}$ be the (sparse) spatial weighting matrix consisting of the elements $\langle\varphi_{j},\varphi_{i}\rangle_{X}$ (introduced Section 2.3.3) and $D\in\mathbb{R}^{n\times n}$ be the diagonal matrix containing the nonnegative weighting parameters $\alpha_{j}^{k}$ . As we have explained in Section 2.2, the POD basis $\{\Psi_{i}\}_{i=1}^{\ell}$ of rank $\ell\leq\mathsf{r}$ can be determined by providing an eigenvalue decomposition of the matrix $\bar{Y}\bar{Y}^{\top}=W^{1/2}YDY^{\top}W^{1/2}\in\mathbb{R}^{m\times m}$ , one of $\bar{Y}^{\top}\bar{Y}=D^{1/2}Y^{\top}WYD^{1/2}\in\mathbb{R}^{n\times n}$ , or a singular value decomposition of $\bar{Y}=W^{1/2}YD^{1/2}\in\mathbb{R}^{m\times n}$ . Since $n\gg m$ in Runs 6-8, the first variant is the cheapest one from a computational point of view. In case of multiple space dimensions or if a second-order time integration scheme such as some Crank-Nicolson technique is applied, the situation is converse. On the other hand, a singular value decomposition is more accurate and stable than an eigenvalue decomposition if the POD elements corresponding to eigenvalues/singular values which are close to zero are taken into account: Since $\lambda_{i}=\sigma_{i}^{2}$ holds for all eigenvalues $\lambda_{i}$ and singular values $\sigma_{i}$ , the singular values are able to decay to machine precision, where the eigenvalues stagnate significantly above. This is illustrated in the left graphic of Figure 16.

Indeed, for $\ell>20$ the EIG-ROM system matrices become singular due to the numerical errors in the eigenfunctions and the reduced-order system is ill-posed in this case while the SVD-ROM model remains stable. In the right plot of Figure 16 POD elements are constructed with respect to different scalar products and the resulting ROM errors are compared: $\|\cdot\|_{H}$ -residuals for $X=H$ (denoted by POD(H)), $\|\cdot\|_{V}$ -residuals for $X=V$ (denoted by POD(V)), $\|\cdot\|_{V}$ -residuals for $X=H$ (denoted by POD(H,V)), which also works quite well, the consideration of time derivatives in the snapshot sample (denoted by POD(H,dt)) which allows to apply the a priori error estimate given in [42, Theorem 1.29-2)] and the corresponding sums of singular values (denoted by SV(H,dt)) corresponding to the unused eigenfunctions in the latter case which indeed nearly coincide with the ROM errors. $\Diamond$

Notice that in many applications, the quality of the reduced-order model does not vary significantly if the weights matrix $W$ refers to the space $X=H$ or $X=V$ and if time derivatives of the used snapshots are taken into account or not. Especially, the ROM residual decays with the same order as the sum over the remaining singular values, independent of the chosen geometrical framework.

5. Optimal snapshot location for computing POD basis functions

The construction of reduced-order models for nonlinear dynamical systems using proper orthogonal decomposition (POD) is based on the information carried of the so-called snapshots. These provide the spatial distribution of the nonlinear system at discrete time instances. Thus, we are interested in optimizing the choice of these time instances in such a manner that the error between the POD-solution and the trajectory of the dynamical system is minimized. This approach was suggested in [59] and was extended in [64] to parametrized elliptic problems. Let us briefly mention some related issues of interest. In [26, 32] the situation of missing snapshot data is investigated and gappy POD is introduced for their reconstruction. An important alternative to POD model reduction is given by reduced basis approximations; we refer to [72] and references given there. In [37] a reduced model is constructed for a parameter dependent family of large scale problems by an iterative procedure that adds new basis variables on the basis of a greedy algorithm. In the Ph.D thesis [20] a model reduction is sought of a class for a family of models corresponding to different operating stages.

Suppose that we are given the $n_{t}$ snapshots $\{y(t_{j})\}_{j=1}^{n_{t}}\subset V\subset X$ . The goal is to determine additional $\mathsf{k}$ snapshots at time instances $\tau=(\tau_{1},\ldots,\tau_{\mathsf{k}})$ with $0\leq\tau_{j}\leq T$ , $j=1,\ldots,\mathsf{k}$ . In [59] we propose to determine $\tau=(\tau_{1},\ldots,\tau_{\mathsf{k}})$ by solving the optimization problem

[TABLE]

where $y$ and $y^{\ell}$ are the solutions to (16) and its POD Galerkin approximation, respectively. Clearly, the definition of the operator $\mathcal{R}$ given in (6) has to be modified as follows:

[TABLE]

with appropriately modified (trapezoidal) weights $\alpha_{j}^{\tau}$ , $j=1,\ldots,\mathsf{k}+n_{t}$ . Consequently, (44) becomes an optimization problem subject to the equality constraints

[TABLE]

Note that no precautions are made in (44) to avoid multiple appearance of a snapshot. In fact, this would simply imply that a specific snapshot location should be given a higher weight than others. While the presented approach shows how to choose optimal snapshots in evolution equations, a similar strategy is applicable in the context of parameter dependent systems.

It turns out in our numerical tests carried out in [59] that the proposed criterion is sensitive with respect to the choice of the time instances. Moreover, the tests demonstrate the feasibility of the method in determining optimal snapshot locations for concrete diffusion equations.

Run 10 (cf. [59, Run 1]).

For $T=1$ let $Q=(0,T)\times\Omega$ and $\Omega=(0,1)\times(0,1)\subset\mathbb{R}^{2}$ . For the FE triangulation we choose a uniform grid with mesh size $h=1/40$ , i.e., we have 900 degrees of freedom for the spatial discretization. Then, we consider

[TABLE]

where $c=0.1$ , $\beta=(0.1,-10)^{\top}\in\mathbb{R}^{2}$ ,

[TABLE]

and $y_{\circ}({\bm{x}})=\sin(\pi x_{1})\cos(\pi x_{2})$ for ${\bm{x}}=(x_{1},x_{2})\in\Omega$ (see Figure 17, left plot).

Furthermore, we have

[TABLE]

We utilize piecewise linear FE functions. The FE solutions $y^{h}=y^{h}(t,{\bm{x}})$ for $t=0.15$ and $t=T$ are shown in Figure 17. Next we take snapshots on the fixed uniform time grid $t_{j}=(j-1)\Delta t$ , $1\leq j\leq n_{t}$ , with $n_{t}=10$ and $\Delta t=T/n_{t}=0.1$ . The goal is to determine four additional time instances $\bar{t}=(\bar{t}_{1},\ldots,\bar{t}_{4})\in[0,T]$ based on a FE approximation for (44). Since the behavior of the solution exhibits more change during the initial time interval $[0,0.3]$ than later on, we initialize our Quasi-Newton method by the starting value $\tau^{0}=(0.05,0.15,0.25,0.35)\in[0,T]$ . The number of POD ansatz functions is fixed to be $\ell=3$ . The corresponding value of the ROM error is approximately $0.1093$ . The optimal solution is given as $\bar{\tau}=(0.0092,0.0076,0.1336,0.2882)\in[0,T]$ , while the associated ROM error is approximately $0.0165$ , which is a reduction of about 85 %. In Figure 18 we can see that the shapes of the three POD bases changes significantly from the initial time instances $\tau^{0}\in\mathbb{R}^{4}$ to the optimal ones $\bar{\tau}\in\mathbb{R}^{4}$ . $\Diamond$

6. Optimal control with POD surrogate models

Reduced-order models are used in PDE-constrained optimization in various ways; see, e.g., [50, 86] for a survey. In optimal control problems it is sometimes necessary to compute a feedback control law instead of a fixed optimal control. In the implementation of these feedback laws models of reduced-order can play an important, and very useful role, see [11, 40, 60, 65, 68, 79]. Another useful application is the use in optimization problems, where a PDE solver is part of the function evaluation. Obviously, thinking of a gradient evaluation or even a step-size rule in the optimization algorithm, an expensive function evaluation leads to an enormous amount of computing time. Here, the reduced-order model can replace the system given by a PDE in the objective function. It is quite common that a PDE can be replaced by a five- or ten-dimensional system of ordinary differential equations. This results computationally in a very fast method for optimization compared to the effort for the computation of a single solution of a PDE. There is a large amount of literature in engineering applications in this regard, we mention only the papers [67, 71]. Recent applications can also be found in finance using the reduced models generated with the reduced basis (RB) method [76] and the POD model [85, 88] in the context of calibration for models in option pricing.

We refer to the survey article [42], where a linear-quadratic optimal control problem in an abstract setting is considered. Error estimates for the POD Galerkin approximations of the optimal control are proved. This is achieved by combining techniques from [28, 29, 44] and [56, 58]. For nonlinear problems we refer the reader to [50, 75, 86]. However, unless the snapshots are generating a sufficiently rich state space or are computed from the exact (unknown) optimal controls, it is not a priorly clear how far the optimal solution of the POD problem is from the exact one. On the other hand, the POD method is a universal tool that is applicable also to problems with time-dependent coefficients or to nonlinear equations. Moreover, by generating snapshots from the real (large) model, a space is constructed that inhibits the main and relevant physical properties of the state system. This, and its ease of use makes POD very competitive in practical use, despite of a certain heuristic flavor. In this context results for a POD a posteriori analysis are important, see e.g., [94] and [41, 54, 55, 91, 93, 96, 98]. Using a fairly standard perturbation method it is deduced how far the suboptimal control, computed on the basis of the POD model, is from the (unknown) exact one. This idea turned out to be very efficient in our examples. It is able to compensate for the lack of a priori analysis for POD methods. Let us also refer to the papers [30, 36, 69], where a posteriori error bounds are computed for linear-quadratic optimal control problems approximated by the reduced basis method.

Data- and/or simulation-based POD models depend on the data (e.g. initial values, right hand sides, boundary conditions, oberservations, etc.) which is used to generate the snapshots. If those models are used as surrogates in e.g. optimization problems with PDE constraints the algorithmical framework has to account for this fact with providing mechanisms for accordingly updating the surrogate model during the solution process. Strategies proposed in this context for optimal flow control can be found in e.g. [3, 4, 9, 34, 17]. One of the most mature methods developed in this context is Trust-Region POD proposed in [9], which since then has successfully been applied in many applications. We also refer to the work [38], where strategies for updating the POD bases are compared.

The quality of the surrogate model highly depends on its information basis, which for snapshot-based methods is given by the snapshot set, compare Section 5. The location of snapshots and also the choice of the initial control in surrogate-based optimal control is discussed in [5]. There, techniques from time-adaptive schemes for optimality systems of parabolic optimal control problems are adjusted to compute optimal time locations for snapshots generation in POD surrogate modeling for parabolic optimal control problems.

Concepts for the construction and use of POD surrogate modeling in robust optimal control of electrical machines are presented in [63, 6]. Those problems are governed by nonlinear partial differential equations with uncertain parameters, so that robustness can be achieved by considering a worst case formulation. The resulting optimization problem then is of bilevel structure and POD reduced-order models in combination with a posteriori error estimators are used to speed up the numerical computations.

7. Miscellaneous

POD model order reduction can also be applied to provide surrogate models for high-fidelity components in networks. The general perspective is discussed in e.g. [48]. Related research for MOR of electrical networks is reported in e.g. [16, 46, 47]. The basic idea here consists in a decoupling of MOR approaches for the network and high-fidelity components which in general are modeled by PDE systems. For the latter, simulation-based POD MOR techniques are used to construct surrogate models which then are stamped back into the (reduced) electrical network. Details and performance tests are reported e.g. in [45, 47]. A short lecture series with related topics is presented under Hinze-Pilsen222https://slideslive.com/38894790/mathematical-aspects-of-proper-orthogonal-decomposition-pod-iii. Further contributions to this topic can be found in [15].

Recent trends in data-driven and nonlinear MOR methods are discussed within a YouTube lecture series under Carlberg-YouTube333https://www.youtube.com/watch?v=KOHxCIx04Dg.

Bibliography101

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Abels. Diffuse Interface Models for Two-Phase flows of Viscous Incompressible Fluids. Max-Planck Institut für Mathematik in den Naturwissenschaften , Leipzig, Lecture Note, 36, 2007.
2[2] H. Abels, H. Garcke, and G. Grün. Thermodynamically consistent, frame indifferent diffuse interface models for incompressible two-phase flows with different densities. Mathematical Models and Methods in Applied Sciences , 22(3), 2012.
3[3] K. Afanasiev and M. Hinze. Adaptive control of a wake flow using proper orthogonal decomposition. Preprint No. 648/1999, Fachbereich Mathematik, TU Berlin, 1999.
4[4] K. Afanasiev and M. Hinze. Adaptive control of a wake flow using proper orthogonal decomposition. Lecture Notes in Pure and Applied Mathematics , 216:317-332, 2001.
5[5] A. Alla, C. Gräßle, M. Hinze. A-posteriori snapshot location for POD in optimal control of linear parabolic equations. ESAIM: Mathematical Modelling and Numerical Analysis (M 2AN), 52(5):1847-1873, 2018.
6[6] A. Alla, M. Hinze, P. Kolvenbach,O. Lass, S. Ulbrich. A certified model reduction approach for robust parameter optimization with PDE constraints. Adv. Comput. Math. , 45:1221-1250, 2019.
7[7] A. Alla and J. N. Kutz. Nonlinear model order reduction via dynamic mode decomposition. SIAM Journal on Scientific Computing , 39:B 778-B 796, 2017.
8[8] M.S. Alnaes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M.E. Rognes, and G.N. Wells. The F Eni CS Project Version 1.5. Archive of Numerical Software , 100:9-23, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Model Order Reduction by

1. Introduction

2. Proper Orthogonal Decomposition (POD)

2.1. The POD method

2.2. Singular value decomposition and POD

Remark 1**.**

2.3. The POD method for nonlinear evolution problems

2.3.1. The nonlinear evolution problems

Assumption 1**.**

Example 1** (Semilinear heat equation).**

Example 2** (Cahn-Hilliard equations).**

2.3.2. Temporal discretization and POD method

Remark 2**.**

2.3.3. Galerkin discretization

2.3.4. POD method for the fully discretized nonlinear evolution problem

2.4. The POD method with snapshots generated by spatially adaptive finite element methods

Run 1** ([39, Example 6.1]).**

Run 2** ([39, Example 6.2]).**

Run 3** ([39, Example 6.3]).**

3. The POD Galerkin procedure

3.1. The POD Galerkin procedure

3.2. Time-discrete reduced-order model

3.3. Discussion of the computation of the nonlinear term

3.4. Expressing the POD solution in the full spatial domain

Run 4** ([39, Example 6.1]).**

Run 5** (Cahn-Hilliard equations).**

4. Certification with a priori and a posteriori error estimates

Run 6** (POD for the heat equation; cf. [42, Run 1]).**

Run 7** (POD for a convection dominated heat equation; cf. [42, Run 2]).**

Run 8** (True and exact approximation error; cf. [42, Run 3]).**

Run 9** (Different strategies for a POD basis computation; cf. [42, Run 4]).**

5. Optimal snapshot location for computing POD basis functions

Run 10** (cf. [59, Run 1]).**

6. Optimal control with POD surrogate models

7. Miscellaneous

Remark 1.

Assumption 1.

Example 1 (Semilinear heat equation).

Example 2 (Cahn-Hilliard equations).

Remark 2.

Run 1 ([39, Example 6.1]).

Run 2 ([39, Example 6.2]).

Run 3 ([39, Example 6.3]).

Run 4 ([39, Example 6.1]).

Run 5 (Cahn-Hilliard equations).

Run 6 (POD for the heat equation; cf. [42, Run 1]).

Run 7 (POD for a convection dominated heat equation; cf. [42, Run 2]).

Run 8 (True and exact approximation error; cf. [42, Run 3]).

Run 9 (Different strategies for a POD basis computation; cf. [42, Run 4]).

Run 10 (cf. [59, Run 1]).