Shallow Neural Networks for Fluid Flow Reconstruction with Limited   Sensors

N. Benjamin Erichson; Lionel Mathelin; Zhewei Yao; Steven L. Brunton,; Michael W. Mahoney; J. Nathan Kutz

arXiv:1902.07358·physics.comp-ph·December 29, 2020

Shallow Neural Networks for Fluid Flow Reconstruction with Limited Sensors

N. Benjamin Erichson, Lionel Mathelin, Zhewei Yao, Steven L. Brunton,, Michael W. Mahoney, J. Nathan Kutz

PDF

1 Repo

TL;DR

This paper introduces a shallow neural network approach for reconstructing fluid flow fields from limited sensor data, outperforming traditional methods and requiring fewer sensors, suitable for global monitoring with sparse measurements.

Contribution

The paper presents a novel shallow neural network methodology for fluid flow reconstruction that is data-driven, end-to-end, and requires no prior knowledge or heavy preprocessing.

Findings

01

Outperforms traditional modal approximation techniques.

02

Achieves comparable performance with fewer sensors.

03

Effective in fluid mechanics and oceanography applications.

Abstract

In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network-based learning methodology for such fluid flow reconstruction. Our approach learns an end-to-end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance…

Tables8

Table 1. Table 1: Performance for the flow past cylinder for a varying number of sensors. Results are averaged over 30 30 30 runs with different sensor distributions, with standard deviations in parentheses. The parameter k ∗ superscript 𝑘 k^{*} indicates the number of modes that were used for flow reconstruction by the pod method, and α 𝛼 \alpha refers to the strength of ridge regularization applied to pod plus .

	Sensors	Training Set		Test Set
		NME	NFE	NME	NFE
pod	5	0.465 (0.39)	0.675 (0.57)	0.488 (0.41)	0.698 (0.59)
pod ( $k^{*} = 4$ )	5	0.217 (0.02)	0.325 (0.01)	0.227 (0.03)	0.324 (0.04)
pod plus ( $α = 1 e- 8$ )	5	0.198 (0.02)	0.288 (0.03)	0.203 (0.02)	0.291 (0.03)
shallow decoder	5	0.003 (0.00)	0.004 (0.00)	0.006 (0.00)	0.008 (0.00)
pod	10	0.346 (1.54)	0.502 (2.23)	0.379 (1.70)	0.542 (2.43)
pod ( $k^{*} = 8$ )	10	0.049 (0.00)	0.071 (0.01)	0.051 (0.01)	0.072 (0.01)
pod plus ( $α = 1 e- 13$ )	10	0.035 (0.01)	0.050 (0.02)	0.035 (0.01)	0.050 (0.02)
shallow decoder	10	0.002 (0.00)	0.003 (0.00)	0.005 (0.00)	0.007 (0.00)
pod	15	0.441 (1.81)	0.639 (2.63)	0.574 (2.44)	0.821 (3.49)
pod ( $k^{*} = 12$ )	15	0.015 (0.00)	0.023 (0.01)	0.016 (0.01)	0.023 (0.01)
pod plus ( $α = 1 e- 12$ )	15	0.016 (0.01)	0.023 (0.01)	0.016 (0.01)	0.022 (0.01)
shallow decoder	15	0.002 (0.00)	0.003 (0.00)	0.005 (0.00)	0.007 (0.00)

Table 2. Table 2: Performance for estimating the flow behind a cylinder using nonlinear sensor measurements. Results are averaged over 30 30 30 runs with different sensor distributions, with std. dev. in parentheses. The standard POD-based method fails for this task. pod plus is able to reconstruct the flow field, yet the estimation quality is poor. In contrast, the SD method performs well.

	Sensors	Training Set		Test Set
		NME	NFE	NME	NFE
pod	10	-	-	-	-
pod plus ( $α = 5 e- 4$ )	10	0.676 (0.00)	0.981 (0.00)	0.682 (0.09)	0.974 (0.00)
shallow decoder	10	0.002 (0.00)	0.003 (0.00)	0.006 (0.00)	0.009 (0.01)

Table 3. Table 3: Performance for estimating the flow behind a cylinder in presence of white noise, using 10 10 10 sensors. Results are averaged over 30 30 30 runs with different sensor distributions, with std. dev. in parentheses. pod fails for this task, while pod plus shows a better performance. The SD shows to be robust to noisy sensor measurements and outperforms the traditional techniques. The parameter k ∗ superscript 𝑘 k^{*} indicates the number of modes that were used for flow reconstruction by the pod method, and the parameter α 𝛼 \alpha refers to the strength of ridge regularization applied to the the pod plus method.

	SNR	Training Set		Test Set
		NME	NFE	NME	NFE
pod	10	9.171 (14.7)	12.69 (20.4)	8.746 (12.9)	11.93 (17.6)
pod ( $k^{*} = 2$ )	10	0.461 (0.02)	0.638 (0.03)	0.468 (0.02)	0.639 (0.02)
pod plus ( $α = 5 e- 5$ )	10	0.468 (0.02)	0.648 (0.02)	0.472 (0.02)	0.644 (0.2)
shallow decoder	10	0.138 (0.02)	0.201 (0.02)	0.278 (0.04)	0.397 (0.05)
pod	50	4.837 (3.08)	6.946 (4.42)	4.520 (2.75)	6.390 (3.89)
pod ( $k^{*} = 2$ )	50	0.342 (0.01)	0.492 (0.01)	0.349 (0.01)	0.493 (0.01)
pod plus ( $α = 1 e- 5$ )	50	0.370 (0.03)	0.539 (0.04)	0.371 (0.02)	0.524 (0.03)
shallow decoder	50	0.134 (0.02)	0.198 (0.02)	0.173 (0.02)	0.247 (0.03)

Table 4. Table 4: Performance for estimating the SST dataset for varying numbers of sensors. Results are averaged over 30 30 30 runs with different sensor distributions, with standard deviations in parentheses. The SD outperforms the traditional techniques and shows to be highly invariant to the sensor location. The parameter k ∗ superscript 𝑘 k^{*} indicates the number of modes that were used for flow reconstruction by the pod method, and α 𝛼 \alpha refers to the strength of ridge regularization applied to pod plus .

	Sensors	Training Set		Test Set
		NME	NFE	NME	NFE
pod	32	0.637 (0.59)	5.915 (5.56)	0.649 (0.62)	6.04 (5.77)
pod ( $k^{*} = 5$ )	32	0.036 (0.00)	0.342 (0.01)	0.037 (0.00)	0.344 (0.01)
pod plus ( $α = 1 e- 5$ )	32	0.036 (0.00)	0.341 (0.01)	0.037 (0.00)	0.343 (0.01)
shallow decoder	32	0.009 (0.00)	0.088 (0.00)	0.014 (0.00)	0.128 (0.00)
pod	64	0.986 (1.34)	9.183 (12.5)	1.007 (1.36)	9.344 (12.7)
pod ( $k^{*} = 14$ )	64	0.032 (0.00)	0.298 (0.01)	0.032 (0.00)	0.301 (0.01)
pod plus ( $α = 5 e- 5$ )	64	0.032 (0.00)	0.301 (0.00)	0.032 (0.00)	0.301 (0.00)
shallow decoder	64	0.009 (0.00)	0.085 (0.00)	0.012 (0.00)	0.118 (0.00)

Table 5. Table 5: Flow reconstruction performance for estimating the isotropic flow. Results are averaged over 30 30 30 runs with different sensor distributions, with standard deviations in parentheses.

	Grids	Training Set		Test Set
		NME	NFE	NME	NFE
shallow decoder	36	0.029 (0.00)	0.041 (0.00)	0.071 (0.00)	0.101 (0.01)
shallow decoder	64	0.027 (0.00)	0.039 (0.00)	0.067 (0.00)	0.096 (0.00)
shallow decoder	121	0.026 (0.00)	0.038 (0.00)	0.066 (0.00)	0.093 (0.00)

Table 6. Table 6: Architecture of the SD for the flow behind the cylinder. The batch size is set to 32 32 32 . Here, we set the dropout rate to 0.1 0.1 0.1 for the noisy situation. We use a small amount of weight decay λ = 1 e- 7 𝜆 1 e- 7 \lambda=1\text{e-}{7} .

Layer	Weight size	Input Shape	Output Shape	Activation	Batch Norm.	Dropout
FC	sensors $\times$ 35	sensors	35	ReLU	True	-
FC	35 $\times$ 40	25	40	ReLU	True	-
FC	40 $\times$ 76,416	40	76,416	Linear	-	-

Table 7. Table 7: Architecture of the SD for the SST dataset. Here, the batch size is set to 200 200 200 .

Layer	Weight size	Input Shape	Output Shape	Activation	Batch Norm.	Dropout
FC	sensors $\times$ 350	sensors	350	ReLU	True	$0.1$
FC	350 $\times$ 400	350	400	ReLU	True	-
FC	400 $\times$ 44,219	400	44,219	Linear	-	-

Table 8. Table 8: Architecture of the SD for isotropic flow. Here, the batch size is set to 200 200 200 .

Layer	Weight size	Input Shape	Output Shape	Activation	Batch Norm.	Dropout
FC	sensors $\times$ 350	sensors	350	ReLU	True	$0.1$
FC	350 $\times$ 400	350	400	ReLU	True	-
FC	400 $\times$ 122,500	400	122,500	Linear	-	-

Equations74

s = H (x),

s = H (x),

x = G (s),

x = G (s),

x = F (s),

x = F (s),

∥ F (s) - G (s) ∥_{2}^{2} < ϵ,

∥ F (s) - G (s) ∥_{2}^{2} < ϵ,

s \mapsto first hidden layer \mapsto second hidden layer \mapsto output layer \mapsto x .

s \mapsto first hidden layer \mapsto second hidden layer \mapsto output layer \mapsto x .

x \approx x = j = 1 \sum k ϕ_{j} ν_{j} = Φ ν,

x \approx x = j = 1 \sum k ϕ_{j} ν_{j} = Φ ν,

X = U Σ V^{T},

X = U Σ V^{T},

s = H x \approx H Φ ν .

s = H x \approx H Φ ν .

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} .

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} .

ν = (H Φ)^{+} s,

ν = (H Φ)^{+} s,

x \approx x = Φ ν .

x \approx x = Φ ν .

ν \in ν arg min ∥ s - H (Φ ν) ∥_{2}^{2} .

ν \in ν arg min ∥ s - H (Φ ν) ∥_{2}^{2} .

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} + α ∥ ν ∥_{2}^{2},

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} + α ∥ ν ∥_{2}^{2},

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} + β ∥ ν ∥_{1},

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} + β ∥ ν ∥_{1},

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} + α ∥ ν ∥_{2}^{2} + β ∥ ν ∥_{1} .

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} + α ∥ ν ∥_{2}^{2} + β ∥ ν ∥_{1} .

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} s.t. Φ_{(n - k)} ν = 0,

ν \in ν arg min ∥ s - H Φ ν ∥_{2}^{2} s.t. Φ_{(n - k)} ν = 0,

P \in P arg min E_{μ_{s, ν}} [s - P ν_{2}^{2}],

P \in P arg min E_{μ_{s, ν}} [s - P ν_{2}^{2}],

P \in P arg min i = 1 \sum n s_{i} - P ν_{i}_{2}^{2} .

P \in P arg min i = 1 \sum n s_{i} - P ν_{i}_{2}^{2} .

P \in P \in R^{p \times k} arg min S - P N_{F}^{2},

P \in P \in R^{p \times k} arg min S - P N_{F}^{2},

P = S N^{+} = S (Φ^{+} X)^{+} = S V Σ^{+},

P = S N^{+} = S (Φ^{+} X)^{+} = S V Σ^{+},

ν \in ν arg min ∥ s - P ν ∥_{2}^{2} .

ν \in ν arg min ∥ s - P ν ∥_{2}^{2} .

ν \in ν arg min ∥ s - P ν ∥_{2}^{2} + λ ∥ ν ∥_{2}^{2},

ν \in ν arg min ∥ s - P ν ∥_{2}^{2} + λ ∥ ν ∥_{2}^{2},

ν^{⋆} - ν = (Φ^{+} - (H Φ)^{+} H) x,

ν^{⋆} - ν = (Φ^{+} - (H Φ)^{+} H) x,

∥ x - x ∥ = (I - Φ (H Φ)^{+} H) x,

∥ x - x ∥ = (I - Φ (H Φ)^{+} H) x,

F (s; W) := R (W^{K} R (W^{K - 1} \dots R (W^{1} s))),

F (s; W) := R (W^{K} R (W^{K - 1} \dots R (W^{1} s))),

F \in F \in F arg min i = 1 \sum n x_{i} - F (s_{i})_{2}^{2} .

F \in F \in F arg min i = 1 \sum n x_{i} - F (s_{i})_{2}^{2} .

F (s) = Ω (ν (ψ (s))) .

F (s) = Ω (ν (ψ (s))) .

z^{ψ} = ψ (s) := R (W^{ψ} s + b^{ψ}),

z^{ψ} = ψ (s) := R (W^{ψ} s + b^{ψ}),

z^{ν} = ν (z^{ψ}) := R (W^{ν} z^{ψ} + b^{ν}),

z^{ν} = ν (z^{ψ}) := R (W^{ν} z^{ψ} + b^{ν}),

x = Ω (z^{ν}) := Φ z^{ν} + b^{Φ},

x = Ω (z^{ν}) := Φ z^{ν} + b^{Φ},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EiffL/FluidFlowPrediction
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Shallow Neural Networks for Fluid Flow Reconstruction with Limited Sensors

N. Benjamin Erichson

ICSI and UC Berkeley

[email protected] &Lionel Mathelin

Université Paris-Saclay, LIMSI-CNRS

[email protected] &Zhewei Yao

ICSI and UC Berkeley

[email protected] &Steven L. Brunton

University of Washington

[email protected] &Michael W. Mahoney

ICSI and UC Berkeley

[email protected] &J. Nathan Kutz

University of Washington

[email protected]

Abstract

In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network-based learning methodology for such fluid flow reconstruction. Our approach learns an end-to-end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance with traditional methods in the area, using significantly fewer sensors. Thus, the mathematical architecture is ideal for emerging global monitoring technologies where measurement data are often limited.

1 Introduction

The ability to reconstruct coherent flow features from limited observation can be critically enabling for applications across the physical and engineering sciences [11, 59, 13, 48, 69]. For example, efficient and accurate fluid flow estimation is critical for active flow control, and it may help to craft more fuel-efficient automobiles as well as high-efficiency turbines. The ability to reconstruct important fluid flow features from limited observation is also central in applications as diverse as cardiac bloodflow modeling and climate science [10]. All of these applications rely on estimating the structure of fluid flows based on limited sensor measurements.

More concretely, the objective is to estimate the flow field $\bm{x}\in\mathbb{R}^{m}$ from sensor measurements $\bm{s}\in\mathbb{R}^{p}$ , that is, to learn the relationship $\bm{s}\mapsto\bm{x}$ . The restriction of limited sensors gives $p\ll m$ . The sensor measurements $\bm{s}$ are collected via a sampling process from the high-dimensional field $\bm{x}$ . We can describe this process as

[TABLE]

where $\bm{H}:\mathbb{R}^{m}{\textnormal{a}}\mathbb{R}^{p}$ denotes a measurement operator. Now, the task of flow reconstruction requires the construction of an inverse model that produces the field $\bm{x}$ in response to the observations $\bm{s}$ , which we may describe as

[TABLE]

where $\bm{G}:\mathbb{R}^{p}{\textnormal{a}}\mathbb{R}^{m}$ denotes a non-linear forward operator. However, the measurement operator $\bm{H}$ may be unknown or highly-nonlinear in practice. Hence, the problem is often ill-posed, and we cannot directly invert the measurement operator $\bm{H}$ to obtain the forward operator $\bm{G}$ .

Fortunately, given a set of training examples $\{\bm{x}_{i},\bm{s}_{i}\}_{i}$ , we may learn a function $\mathcal{F}$ to approximate the forward operator $\bm{G}$ . Specifically, we aim to learn a function $\mathcal{F}:\bm{s}\mapsto\bm{\widehat{x}}$ which maps a limited number of measurements to the estimated state $\bm{\widehat{x}}$ :

[TABLE]

so that the misfit is small, e.g., in an Euclidean sense over all sensor measurements

[TABLE]

where $\epsilon$ is a small positive number. Neural network based inversion is common practice in machine learning [50], dating back to the late 80’s [71]. This powerful learning paradigm is also increasingly used for flow reconstruction [45, 15, 30], prediction [65, 25, 4, 33, 60], and simulations [42]. In particular, deep inverse transform learning is an emerging concept [51, 39, 1, 68], which has been shown to outperform traditional methods in applications such as denoising, deconvolution, and super-resolution.

Here, we explore shallow neural networks (SNNs) to learn the input-to-output mapping between the sensor measurements and the flow field. Figure 1 shows a design sketch for the proposed framework for fluid flow reconstruction. We can express the network architecture (henceforth called shallow decoder (SD)), more concisely as follows:

[TABLE]

SNNs are considered to be networks with very few hidden layers. We favor shallow over deep architectures, because the simplicity of SNNs allows faster training, less tuning, and easier interpretation (and also since it works, and thus there is no need to consider deeper architectures).

There are several advantages of this mathematical approach over traditional scientific computing methods for fluid flow reconstruction [16, 21, 12, 66, 48]. First, the SD considered here features a linear last layer and provides a supervised joint learning framework for the low-dimensional approximation space of the flow field and the map from the measurements to this low-dimensional space. This allows the approximation basis to be tailored not only to the state space but also to the associated measurements, preventing observability issues. In contrast, these two steps are disconnected in standard methods (discussed in more detail in Section 2). Second, the method allows for flexibility in the measurements, which do not necessarily have to be linearly related to the state, as in many standard methods. Finally, the shallow decoder network produces interpretable features of the dynamics, potentially improving on classical proper orthogonal decomposition (POD), also known as principal component analysis (PCA), low-rank features. For instance, Figure 2 shows that the basis learned via an SNN exhibits elements resembling physically consistent quantities, in contrast with alternative POD-based modal approximation methods that enforce orthogonality. The interpretation of the last (linear) layer is as follows: a given mode is constituted by the value of each spatially localized weights connecting the associated given node in the last hidden layer to nodes of the output layer.

Limitations of our approach are standard to data-driven methods, in that the training data should be as representative as possible of the system, in the sense that it should comprise samples drawn from the same statistical distribution as the testing data.

The paper is organized as follows. Sec. 2 discusses traditional modal approximations techniques. Then, in Sec. 3, the specific implementation and architecture of our shallow decoder is described. Results are presented in Sec. 4 for various applications of interest. We aim to reconstruct (a) the vorticity field of a flow behind a cylinder from a handful sensors on the cylinder surface, (b) the mean sea surface temperature from weekly sea surface temperatures for the last 26 years, and (c) the velocity field of a turbulent isotropic flow. We show that a very small number of sensor measurements is indeed sufficient for flow reconstruction in these applications. Further, we show that the shallow decoder can handle non-linear measurements and is robust to measurement noise. The results show significantly improved performance compared to traditional modal approximations techniques. The paper concludes in Sec. 5 with a discussion and outlook of the use of SNNs for more general flow field reconstructions.

2 Background on high-dimensional state estimation

The task of reconstructing from a limited number of measurements to the high-dimensional state-space is made possible by the fact that the dynamics for many complex systems, or datasets, exhibit some sort of low-dimensional structure. This fact has been exploited for state estimation using (i) a tailored basis, such as POD, or (ii) a general basis in which the signal is sparse, e.g., typically a Fourier or wavelet basis will suffice. In the former, gappy POD methods [26] have been developed for principled reconstruction strategies [16, 21, 12, 66, 48]. In the latter, compressive sensing methods [14, 19, 5] serve as a principled technique for reconstruction. Both techniques exploit the fact that there exists a basis in which the high-dimensional state vector has a sparse, or compressible, representation. In [49], a basis is learned such that it leads to a sparse approximation of the high-dimensional state while enforcing observability from the sensors.

Next, we describe standard techniques for the estimation of a state $\bm{x}$ from observations $\bm{s}$ , and we discuss observability issues. Established techniques for state reconstruction are based on the idea that a field $\bm{x}$ can be expressed in terms of a rank- $k$ approximation

[TABLE]

where $\displaystyle\left\{{\bm{\phi}_{j}}\right\}_{j}$ are the modes of the approximation and $\left\{{\nu_{j}}\right\}_{j}$ are the associated coefficients. The approximation space is derived from a given training set using unsupervised learning techniques. A typical approach to determine the approximation modes is POD [6, 16, 21, 48]. Randomized methods for linear algebra enable the fast computation of such approximation modes [47, 20, 34, 22, 23, 24]. Given the approximation modes $\bm{\Phi}$ , estimating the state $\bm{x}$ reduces to determining the coefficients $\bm{\nu}$ from the sensor measurements $\bm{s}$ using supervised techniques. These typically aim to find the minimum-energy or minimum-norm solution that is consistent in a least-squares sense with the measured data.

2.1 Standard approach: Estimation via POD based methods

Two POD-based methods are discussed, which we will refer to as pod and pod plus in the following. Both approaches reconstruct the state with POD modes, by estimating the coefficients from sensor information. The POD modes $\bm{\Phi}$ are obtained via the singular value decomposition of the mean centered training set $\bm{{X}}=\left(\bm{x}_{1}\ldots\bm{x}_{n}\right)$ , with typically $n\leq m$ :

[TABLE]

where the columns of $\bm{U}\in\mathbb{R}^{m\times n}$ are the left singular vectors and the columns of $\bm{V}\in\mathbb{R}^{n\times n}$ are the right singular vectors. The corresponding singular values are the diagonal elements of $\bm{\Sigma}\in\mathbb{R}^{n\times n}$ . Now, we define the approximation modes as $\bm{\Phi}:=\bm{U}_{k}$ , by selecting $k$ left singular vectors, with $k\leq p$ . Typically, we select the dominant $k$ singular vectors as approximation modes, however, there are exceptions to this rule as discussed below.

2.1.1 Standard POD-based method

Let a linear measurement operator $\bm{H}:\mathbb{R}^{m}{\textnormal{a}}\mathbb{R}^{p}$ describe the relationship between the field and the associated observations, $\bm{s}=\bm{H}\,\bm{x}$ . The approximation of the field $\bm{x}$ with the approximation modes $\left\{{\bm{\phi}_{j}}\right\}_{j}$ is obtained by solving the following equation for $\bm{\nu}\in\mathbb{R}^{n}$ :

[TABLE]

A standard approach is to simply solve the following least-squares problem

[TABLE]

The solution with the minimum $L^{2}$ -norm is given by:

[TABLE]

with the superscript + denoting the Moore-Penrose pseudo-inverse. In this situation, the high-dimensional state is then estimated as

[TABLE]

This approach is hereafter referred to as POD and has been used in previous efforts, e.g., [52, 54].

With a nonlinear measurement operator $\bm{H}$ , the problem formulates similarly as a nonlinear least squares problem:

[TABLE]

In this case, no closed form solution is available in general and a nonlinear optimization problem must be solved, whose computational burden limits the online (real-time) field reconstruction capability. Further, the solution of the, often ill-posed, problem is not necessarily unique and does not allow for a reliable estimate. In contrast, the shallow decoder is trained end-to-end and essentially learns to associate measurements to the right solution (see Section 3 for details) .

2.1.2 Improved POD-based method

The standard POD-based method has several shortcomings. First, the least-squares problem formulated in Eq. 7 can be underspecified. Thus, it is favorable to introduce some bias in order to reduce the variance by means of regularization. Ridge regularization is the most popular regularization technique for reducing the variance of the estimator:

[TABLE]

where $\alpha>0$ is the penalization parameter. Typically, this parameter is determined by $k$ -fold cross-validation. An alternative approach to reduce the variance is to select a subset of the POD modes, i.e., only a few of the estimated coefficients are non-zero. The so-called Least Absolute Shrinkage and Selection Operator (LASSO) for least-squares [64, 36] can be formulated as:

[TABLE]

where $\beta>0$ controls the amount of sparsity. One can also combine both LASSO and ridge regularization, resulting in the so-called ElasticNet [72, 36] regularizer:

[TABLE]

This regularization scheme often shows an improved predictive performance in practice, however, it requires that the user fiddles around with two tuning parameters $\alpha$ and $\beta$ .

Yet another approach is to use a shrinkage estimator that only retains the high variance POD modes, i.e., an estimator that selects a subset of all the POD modes that is used for solving the least squares problem. More concretely, we formulate the following constrained problem:

[TABLE]

where $\bm{\Phi}_{(n-k)}=\{\bm{\phi}_{k+1},\dots,\bm{\phi}_{n}\}$ . Here, $k\leq n$ refers to the number of selected POD modes, reordered with indices $\left\{1,2,\ldots,k\right\}$ . This hard threshold regularizer constraints the solution to the column space of the selected POD modes and is also known as Principal Component Regression (PCR) [36]. In contrast to the smooth shrinkage effect of ridge regularization, the hard threshold regularizer has a discrete shrinkage effect that nullifies the contributions of some of the low variance modes completely. However, based on our experiments, both ridge regression and the hard threshold shrinkage estimator perform on par for the task of flow field reconstruction. This said, the ElasticNet regularizer might lead to a better predictive accuracy, since it can select the POD modes that are most useful for prediction, rather than only selecting the high-variance POD modes. It is known that the POD modes with low variances may also be important for predictive tasks [28, 40] and could help to further improve the performance of the POD-based methods.

Another shortcoming of the POD-based approach is that it requires explicit knowledge of the observation operator $\bm{H}$ and is subjected to ill-conditioning of the least-squares problem. These limitations render this “vanilla flavored” approach often impractical in many situations, and they motivate an alternative formulation. The idea is to learn the map between coefficients and observations without explicitly referring to $\bm{H}$ . It can be implicitly described by a, possibly nonlinear, operator $\bm{P}:\mathbb{R}^{k}{\textnormal{a}}\mathbb{R}^{p}$ typically determined offline by minimizing the Bayes risk, defined as the misfit in the $L^{2}$ -sense:

[TABLE]

where $\mu_{\bm{s},\bm{\nu}}$ is the joint probability measure of the observations $\bm{s}$ and the coefficients $\bm{\nu}$ obtained by projecting the field onto the (orthonormal) POD modes, $\bm{\nu}=\bm{\Phi}^{\mathsf{T}}\,\bm{x}$ . This step only relies on information from the training set and is thus performed offline.

We assume the training set is representative of the underlying system, in the sense that it should contain independent samples drawn from the stationary distribution of the physical system at hand. The Bayes risk is then approximated by an empirical estimate, and the operator $\bm{P}$ is determined as

[TABLE]

When the measurement operator $\bm{H}$ is linear, $\bm{P}$ is then an empirical estimate of $\bm{H}\,\bm{\Phi}$ , the contribution of the basis modes $\left\{{\bm{\phi}_{j}}\right\}_{j}$ to the measurements $\bm{s}$ . This formulation was already considered in our previous work, e.g., [49], and brings flexibility in the properties of the map $\bm{P}$ compared to the closed-form solution in Eq. 8. For instance, regularization by sparsity can be enforced in $\bm{P}$ , via $L^{0}$ - or $L^{1}$ -penalization. Expressing Eq. 16 in matrix form yields:

[TABLE]

where $\bm{{S}}\in\mathbb{R}^{p\times n}$ and $\bm{N}\in\mathbb{R}^{k\times n}$ respectively refers to the training data measurements $\left\{{\bm{s}_{i}}\right\}_{i}$ and coefficients $\left\{{\bm{\nu}_{i}}\right\}_{i}$ . It immediatly follows

[TABLE]

and the online approximation obtained by pod plus is finally given by the solution to the following least-squares problem

[TABLE]

However, $\bm{\nu}\in\mathbb{R}^{k}$ is typically higher-dimensional than $\bm{s}\in\mathbb{R}^{p}$ , and thus the problem is ill-posed. We then make use of the popular Tikhonov regularization, selecting the solution with the minimum $L^{2}$ -norm. This results in a ridge regression problem formulated as:

[TABLE]

with $\lambda>0$ . As will be seen in the examples below, penalization of the magnitude of the coefficients can significantly improve the performance of the POD approach.

2.2 Observability issue

The above techniques are standard in the scientific computing literature for flow reconstruction, but they bear a severe limitation. Indeed, since it is derived in an unsupervised fashion from the set of instances $\left\{{\bm{x}_{i}}\right\}_{i}$ , the approximation basis $\displaystyle\left\{{\bm{\phi}_{j}}\right\}_{j}$ is agnostic to the measurements $\bm{s}$ . In other words, the approximation basis is determined with no supervision by the measurements. To illustrate the impact of this situation, let $\bm{\nu}^{\star}=\bm{\Phi}^{+}\,\bm{x}$ be the least-squares estimate of the approximation coefficients for a given field $\bm{x}$ . The difference between the least-square estimate coefficients $\bm{\nu}^{\star}$ and the coefficients $\bm{\nu}$ obtained from the linear sensor measurements $\bm{s}$ writes

[TABLE]

and the error in the reconstructed field is obtained immediately:

[TABLE]

where $\bm{I}$ is the identity matrix of suitable dimension.

The error in the reconstructed field is seen to depend on both the approximation basis $\bm{\Phi}$ and the measurement operator $\bm{H}$ . The measurement operator is entirely defined by the sensor locations, and it does not depend on the basis considered to approximate the field. Hence, to reduce (the expectation of) the reconstruction error, the approximation basis must be informed both by the dataset $\left\{{\bm{x}_{i}}\right\}_{i}$ and the sensors available, through $\bm{H}$ . For example, poorly located sensors will lead to a large set of $\bm{x}_{i}$ to lie in the nullspace of $\bm{H}$ , preventing their estimation, while the coefficients of certain approximation modes may be affected by the observation $\bm{H}\,\bm{x}_{i}$ of certain realizations $\bm{x}_{i}$ being severely amplified by $\left(\bm{H}\,\bm{\Phi}\right)^{+}$ if the approximation basis is not carefully chosen.

This remark can be interpreted in terms of the control theory concept of observability of the basis modes by the sensors. Most papers in the literature focus their attention on deriving an approximation basis leading to a good representation [12, 66, 48], i.e., such that the training set is well approximated in the $k$ -dimensional basis $\displaystyle\left\{{\bm{\phi}_{j}}\right\}_{j}$ , $\bm{x}\approx\bm{\Phi}\,\bm{\nu}$ . But how well the associated coefficients $\bm{\nu}=\bm{\nu}\left(\bm{s}\right)$ are informed by the measurements is usually overlooked when deriving the basis. In practice, the decoupling between learning an approximation basis and learning a map to the associated coefficients often leads to a performance bottleneck in the estimation procedure. Enforcing observability of the approximation basis by the sensors is key to a good recovery performance and can dramatically improve upon unsupervised methods, as shown in [49].

3 Shallow neural networks for flow reconstruction

Shallow learning techniques are widely used for flow reconstruction. For instance, the approximation based approach for flow reconstruction, outlined in Section 2, can be considered to have two levels of complexity. The first level is concerned with computing an approximation basis, while the second level performs a linear weighted combination of the basis elements to estimate the high-dimensional flow field. Such shallow learning techniques are easy to train and tune. In addition, the levels are often physically meaningful, and they may provide some interesting insights into the underlying mechanics of the system under consideration.

In the following, we propose a simple SSN as an alternative to traditional methods, which are typically very shallow, for flow reconstruction problems. Our proposed shallow decoder adds only one or two additional layers of complexity to the problem.

3.1 A shallow decoder for flow reconstruction

We can define a fully-connected neural network (NN) with $K$ layers as a nested set of functions

[TABLE]

where $R(\cdot):\mathbb{R}\rightarrow\mathbb{R}$ denotes a coordinate-wise scalar (non-linear) activation function and $\bm{W}$ denotes a set of $\left\{\bm{W}^{k}\right\}_{k}$ weight matrices, $k=1,...,K$ , with appropriate dimensions. NN-based learning provides a flexible framework for estimating the relationship between quantities from a collection of samples. Here, we consider SNNs, which are considered to be networks with very few, often only one, or even no, hidden layers, i.e., $K$ is very small.

In the following, an estimate of a vector $\bm{y}$ is denoted as $\widehat{\bm{y}}$ , while $\widetilde{\bm{y}}$ denotes dummy vectors upon which one optimizes. Relying on a training set $\{\bm{x}_{i},\bm{s}_{i}\}_{i=1}^{n}$ , with $n$ examples $\bm{x}_{i}$ and corresponding sensor measurements $\bm{s}_{i}$ , we aim to learn a function $\mathcal{F}:\bm{s}\mapsto\bm{\widehat{x}}$ belonging to a class of neural networks $\mathscr{F}$ which minimizes the misfit in an Euclidean sense, over all sensor measurements

[TABLE]

We assume that only a small number of training examples is available. Further, no prior information is assumed to be available, and the estimation method is purely data-driven. Importantly, we assume no knowledge about the underlying measurement operator which is used to collect the sensor measurements. Further, unlike traditional methods for flow reconstruction, this NN-based learning methodology allows the joint learning of both the modes and the coefficients.

3.2 Architecture

We now discuss some general principles guiding the design of a good network architecture for flow reconstruction. These considerations lead to the following nested nonlinear function

[TABLE]

The architecture design is guided by the paradigm of simplicity. Indeed, the architecture should enable fast training, little tuning, and offer an intuitive interpretation.

Recall that the interpretability of the flow field estimate is favored by representing it in a basis of moderate size, whose modes can be identified with spatial structures of the field. This means, the estimate can be represented as a linear combination of $k$ modes $\left\{{\bm{\phi}_{j}}\right\}_{j}$ , weighted by coefficients $\left\{{\nu_{j}}\right\}_{j}$ , see Eq. 4. These modes are a function of the inputs. This naturally leads to consider a network in which the output $\bm{\widehat{x}}$ is given by a linear, fully connected, last layer of $k$ inputs, interpreted as $\bm{\nu}$ . These coefficients are informed by the sensor measurements $\bm{s}$ in a nonlinear way.

The nonlinear map $\bm{s}\mapsto\bm{\nu}$ can be described by a hidden layer, whose outputs $\bm{\psi}$ are hereafter termed measurement features, in analogy with kernel-based methods, where raw measurements $\bm{s}$ are nonlinearly lifted as extended measurements to a higher-dimensional space. In this architecture, the measurement features $\bm{\psi}$ essentially describe nonlinear combinations of the input measurement $\bm{s}$ . The nonlinear combinations are then mapped to the coefficients $\bm{\nu}$ associated with the modes $\bm{\phi}$ . While the size of the output layer is that of the discrete field $\bm{x}$ , the size of the last hidden layer ( $\bm{\nu}$ ) is chosen and defines the size $k$ of the dictionary $\bm{\Phi}$ . This size can be estimated from the data $\left\{{\bm{x}_{i}}\right\}_{i}$ by dimensionality estimation techniques [32, 27]. Restricting the description of the training data to a low-dimensional space is of potential interest to practitioners who may interpret the elements of the resulting basis in a physically meaningful way. The additional structure allows one to express the field of interest in terms of modes that practitioners may interpret, i.e., relate to some physics phenomena such as traveling waves, instability patterns (e.g., Kelvin-Helmholtz), etc.

In contrast, the size of the first hidden layer describing $\bm{\psi}$ is essentially driven by the size of the input layer ( $\bm{s}$ ) and the number of nonlinear combinations used to nonlinearly inform the coefficients $\bm{\nu}$ . The general shape of the network then bears flexibility in the hidden layers. A popular architecture for decoders consists of non decreasing layer sizes, so as to increase continuously the size of the representation from the low-dimensional observations to the high-dimensional field. We can model $\mathcal{F}$ as a shallow neural network with two hidden layers $\bm{\psi}$ and $\bm{\nu}$ , followed by a linear output layer $\bm{\Omega}$ .

Two types of hidden layers, namely fully-connected (FC) and convolution layers can be considered. The power of convolution layers is key to the success of recent deep learning architectures in computer vision. However, in our problem, we favor fully-connected layers. The reason are as follows: (i) our sensor measurements have no spatial ordering; (ii) depending on the number of filters, convolution layers require a large number of examples for training, while we assume that only a small number of examples are available for training; (iii) potential dynamical systems that we consider evolve on a curved domain which is typically represented using an unstructured grid. Thus, the first and second hidden layers take the form

[TABLE]

and

[TABLE]

where $\bm{{W}}$ denotes a dense weight matrix and $\bm{b}$ is a bias term. The function $R(\cdot)$ denotes an activation function used to introduce nonlinearity into the model as discussed below. The final linear output layer simply takes the form of

[TABLE]

where we interpret the columns of the weight matrix $\bm{\Phi}$ as modes. In summary, the architecture of our shallow decoder can be outlined as

[TABLE]

Depending on the dataset, we need to adjust the size of each layer. Here, we use narrow rather than wide layers. Prescribing the size of the output layer restricts the dimension of the space in which the estimation lies, and it effectively regularizes the problem, e.g., filtering-out most of the noise which is not living in a low-dimensional space.

The rectified linear unit (ReLU) activation function is among the most popular choices in computer vision applications, owing to its favorable properties [31]. The ReLU activation is defined as the positive part of a signal $\bm{z}$ :

[TABLE]

The transformed input signal is also called activation. While the ReLU activation function performs best on average in our experiments, there are other choices. For instance, we have also considered the Swish [2] activation function.

3.3 Regularization

Overfitting is a common problem in machine learning and occurs if a function fits a limited set of data points too closely. In particular, this is a problem for deep neural networks which often have more neurons (trainable parameters) than can be justified by the limited amount of training examples which are available. There is increasing interest in characterizing and understanding generalization and overfitting in NNs [55, 7]. Hence, additional constraints are required to learn a function which generalizes to new observations that have not been used for training. Standard strategies to avoid overfitting include early stopping rules, and weight penalties ( $L^{2}$ regularization) to regularize the complexity of the function (network). In addition to these two strategies, we use also batch normalization (BN) [38] and dropout layers (DL) [61] to improve the convergence and robustness of the shallow decoder. This yields the following architecture:

[TABLE]

Regularization, in its various forms, requires one to “fiddle” with a large number of knobs (i.e., hyper-parameters). However, we have found that SNNs are less sensitive to the particular choice of parameters; hence, SNNs are easier to tune.

Batch normalization.

BN is a technique to normalize (mean zero and unit standard deviation) the activation. From a statistical perspective, BN eases the effect of internal covariate shifts [38]. In other words, BN accounts for the change of distribution of the output signals (activation) across different mini batches during training. Each BN layer has two parameters which are learned during the training stage. This simple, yet effective, prepossessing step allows one to use higher learning rates for training the network. In addition it also reduces overfitting owing to its regularization effect.

Dropout layer.

DL helps to improve the robustness of a NN. The idea is to switch off (drop) a small fraction of randomly chosen hidden units (neurons) during the training stage. This strategy can be seen as some form of regularization which also helps to reduce interdependent learning between the units of a fully connected layer. In our experiments the drop ratio is set to $p=10\%$ .

3.4 A note on overparameterized networks

The expressive power of NNs can be seen as a function of the depth (i.e., number of hidden layers) and the width (i.e., number of neurons per hidden layer) of the architecture [46]. Shallow networks typically tend to compensate for the reduced depth by increasing the width of the hidden layers. In turn, this can lead to shallow architectures that have more parameters than a comparable deep and narrow architecture for the same problem. However, such (potentially) overparameterized networks do not necessarily perform worse. On the contrary, recent theory suggests that it can be easier to train very overparameterized models with stochastic gradient descent (SGD) [41, 3].

This may be surprising, since conventional ML wisdom states that overparamerized models tend to overfit and show poor generalization performance. However, recent results show that overparamerized models trained to minimum norm solutions can indeed preserve the ability to generalize well [70, 35, 8, 57, 18].

3.5 Optimization

Given a training set with $n$ targets $\left\{{\bm{x}_{i}}\right\}_{i}$ and corresponding sensor measurements $\left\{{\bm{s}_{i}}\right\}_{i}$ , we minimize the misfit between the reconstructed quantity $\bm{\widehat{x}}=\mathcal{F}(\bm{s})$ and the observed quantity $\bm{x}$ , in terms of the squared $L^{2}$ -norm

[TABLE]

The second term on the right hand side introduces $L^{2}$ regularization to the weight matrices, which is controlled via the parameter $\lambda>0$ . It is well-known that $L^{2}$ -norm is sensitive to outliers; and the $L^{1}$ -norm can be used as a more robust loss function.

We use the ADAM optimization algorithm [43] to train the shallow decoder, with learning rate $10^{-2}$ and weight decay $10^{-4}$ (also known as $L^{2}$ regularization). The learning rate, also known as step size, controls how much we adjust the weights in each epoch. The weight decay parameter is important since it allows one to regularize the complexity of the network. In practice, we can improve the performance by changing the learning rate during training. We decay the learning rate by a factor of $0.9$ after $100$ epochs. Indeed, the reconstruction performance in our experiments is considerably improved by this dynamic scheme, compared to a fixed parameter setting. In our experiments, ADAM shows a better performance than SGD with momentum [62] and averaged SGD [56]. The hyper-parameters can be fine tuned in practice, but our choice of parameters works reasonably well for several different examples. Note that we use the method described by [37] in order to initialize the weights. This initialization scheme is favorable, in particular because the output layer is high-dimensional.

4 Empirical evaluation

We evaluate our methods on three classes of data. First, we consider a periodic flow behind a circular cylinder, as a canonical example of fluid flow. Then, we consider the weekly mean sea surface temperature (SST), as a second and more challenging example. Finally, the third and most challenging example we consider is a forced isotropic turbulence flow.

As discussed in Section 1, the shallow decoder requires that the training data represent the system, in the sense that they should comprise samples drawn from the same statistical distribution as the testing data. Indeed, this limitation is standard to data-driven methods, both for flow reconstruction and also more generally. Hence, we are mainly concerned with exploring reconstruction performance and generalizability for within sample prediction rather than for out of sample prediction tasks. In our third example, however, we demonstrate the limitations of the shallow decoder, illustrating difficulties that arise when one tries to extrapolate, rather than interpolate, the flow field. Figure 3 illustrates the difference between the two types of tasks.

In the first two example classes of data, the sensor information is a subset of the high-dimensional flow field, i.e., the measurement operator $\bm{H}\in\mathbb{R}^{p\times m}$ only has one non-zero entry in rows corresponding to the index of a sensor location. Letting $\mathcal{J}\in\left[1,m\right]^{p}\subset\mathbb{N}^{p}$ be the set of indices indexing the spatial location of the sensors, the measurement operator is such that

[TABLE]

that is, the observations are simply point-wise measurements of the field of interest. In the above equation, $\bm{x}_{\mathcal{J}}$ is the restriction of $\bm{x}$ to its entries indexed by $\mathcal{J}$ . In this paper, no attempt is made to optimize the location of the sensors. In practical situations, they are often given or constrained by other considerations (wiring, intrusivity, manufacturing, etc.). We use random locations in our examples. The third example class of data demonstrates the SD using sub-gridscale measurements.

The error is quantified in terms of the normalized root-mean-square residual error

[TABLE]

denoted in the following as “NME.” However, this measure can be misleading if the empirical mean is dominating. Hence, we consider also a more sensitive measure which quantifies the reconstruction accuracy of the deviations around the empirical mean. We define this measure as

[TABLE]

where $\bm{x}^{\prime}$ and $\bm{\widehat{x}}^{\prime}$ are the fluctuating parts around the empirical mean. In our experiments, we average the errors over $30$ runs for different sensor distributions.

4.1 Fluid flow behind the cylinder

The first example we consider is the fluid flow behind a circular cylinder, at Reynolds number $100$ , based on cylinder diameter, a canonical example in fluid dynamics [53]. The flow is characterized by a periodically shedding wake structure and exhibits smooth, large scale, patterns. A direct numerical simulation of the two-dimensional Navier-Stokes equations is achieved via the immersed boundary projection method [63, 17]. In particular, we use the fast multidomain method [17], which simulates the flow on five nested grids of increasing size, with each grid consisting of $199\times 449$ grid points, covering a domain of $4\times 9$ cylinder diameters on the finest domain. We collect $151$ snapshots in time, sampled uniformly in time and covering several periods of vortex shedding. For the following experiment, we use cropped snapshots of dimension $199\times 384$ on the finest domain, as we omit the spatial domain upstream to the cylinder. Further, we split the dataset into a training and test set so that the training set comprises the first $100$ snapshots, while the remaining $51$ snapshots are used for validation. Note that different splittings (interpolation and extrapolation) yield nearly the same results since the flow is periodic.

4.1.1 Varying numbers of random structured point-wise sensor measurements

We investigate the performance of the shallow decoder using varying numbers of sensors. A realistic setting is considered in that the sensors can only be located on a solid surface. The retained configuration aims at reconstructing the entire vorticity flow field from information at the cylinder surface only. The results are averaged over different sensor distributions on the cylinder downstream-facing surface and are summarized in Table 1. Further, to contextualize the precision of the algorithms, we also state the standard deviation in parentheses.

The shallow decoder shows an excellent flow reconstruction performance compared to traditional methods. Indeed, the results show that very few sensors are already sufficient to get an accurate approximation. Further, we can see that the shallow decoder is insensitive to the sensor location, i.e., the variability of the performance is low when different sensor distributions on the cylinder surface are used. In stark contrast, this simple setup poses a challenge for the pod method without regularization, which is seen to be highly sensitive to the sensor configuration. This is expected since poorly located sensors lead to a large probability that the vorticity field $\bm{x}_{i}$ lies in the nullspace of $\bm{H}$ , preventing its estimation, as discussed in Section 2. While regularization can improve the robustness slightly, the POD-based methods still require about at least $15$ sensors to provide accurate estimations for the high-dimensional flow field. ( Here, we list results for the pod method with hard-threshold regularization and pod plus method with ridge regularization. The number of retained components (hard-threshold), that were used for flow reconstruction, is indicated by $k^{*}$ and the strength of ridge regularization is denoted by the parameter $\alpha$ . See Appendix A for more details.) In contrast, the shallow decoder exhibits a good performance with as few as 5 sensors. Note that the traditional methods could benefit from optimal sensor placement [48]; however, this is beyond the scope of this paper.

Figure 4 provides visual results for two specific sensor configuration using $5$ sensors. The second configuration is challenging for pod, which fails to provide an accurate reconstruction. pod plus provides a more accurate reconstruction of the flow field. The shallow decoder outperforms the traditional methods in both situations.

4.1.2 Non-linear sensor measurements

So far, the sensor information consisted of pointwise measurements of the local flow field so that the $j$ -th measurement is given by $\bm{s}^{\left(j\right)}=\bm{H}_{j}\,\bm{x}=\delta_{\tau_{j}}\left[\bm{x}\right]=\bm{x}^{\left(j\right)}$ , $j=1,\ldots,p$ , with $\delta_{\tau_{j}}$ a Dirac distribution centered at the location of the $j$ -th sensor and $\bm{s}^{\left(j\right)}$ and $\bm{x}^{\left(j\right)}$ the $j$ -th component of $\bm{s}$ and $\bm{x}$ respectively. We now consider nonlinear measurements to demonstrate the flexibility of the shallow decoder. Here, we consider the simple setting of squared sensor measurements: $\bm{s}^{\left(j\right)}=\left(\bm{x}\odot\bm{x}\right)^{\left(j\right)}$ , where $\odot$ denotes the Hadamard product. Table 2 provides a summary of the results, using $10$ sensors. The shallow decoder is agnostic to the functional form of the sensor measurements, and it achieves nearly the same performance as in the linear case above, i.e., the error for the test set increases less than $1\%$ compared to the linear case in Table 1.

4.1.3 Noisy sensor measurements

To investigate further the robustness and flexibility of the shallow decoder, we consider flow reconstruction in the presence of additive white noise. While this is not of concern when dealing with flow simulations, it is a realistic setting when dealing with flows obtained in experimental studies. Table 3 lists the results for both a high and low noise situation with linear measurements. By inspection, the performance of the shallow decoder outperforms classical techniques. In the high noise case, with a signal-to-noise ratio (SNR) of $10$ , the average relative reconstruction error for the test set is about $27\%$ for the shallow decoder. For a SNR of $50$ , the relative error is as low as $17\%$ . Note that we here use an additional dropout layer (placed after the first fully-connected layer) to improve the robustness of the shallow decoder. In contrast, standard pod fails in both situations. Again, the pod plus method shows improved results over the standard pod. However, the visual results in Figure 5 show that the reconstruction quality of the shallow decoder is favorable. The shallow decoder shows a clear advantage and a denoising effect. Indeed the reconstructed snapshots allow for a meaningful interpretation of the underlying structure.

4.1.4 Summary of empirical results for the flow behind the cylinder

The empirical results show that the advantage of the shallow decoder compared to the traditional POD based techniques is pronounced, even for a simple problem such as the flow behind the cylinder. It can be seen, that the performance of the traditional techniques is patchy, i.e., the reconstruction quality is highly sensitive to the sensor location. While regularization can mitigate a poor sensor placement design, a relatively larger number ( $>15$ ) of sensors is required in order to achieve an accurate reconstruction performance. More challenging situations such as nonlinear measurements and sensor noise pose a challenge for the traditional techniques, while the shallow decoder shows to be able to reconstruct dominant flow features in such situations. The computational demands required to train the shallow decoder are minimal, e.g., the time for training on a modern GPU remains below two minutes for this example.

4.2 Sea surface temperature using random point-wise measurements

The second example we consider is the more challenging sea surface temperature (SST) dataset. Complex ocean dynamics lead to rich flow phenomena, featuring interesting seasonal fluctuations. While the mean SST flow field is characterized by a periodic structure, the flow is non-stationary. The dataset consists of the weekly sea surface temperatures for the last 26 years, publicly available from the National Oceanic & Atmospheric Administration (NOAA).The data comprise $1483$ snapshots in time with spatial resolution of $180\times 360$ . For the following experiments, we only consider $44,219$ measurements, by excluding measurements corresponding to the land masses. Further, we create a training set by selecting $1100$ snapshots at random, while the remaining snapshots are used for validation.

We consider the performance of the shallow decoder using varying numbers of random sensors scattered across the spatial domain. The results are summarized in Table 4. We observe a large discrepancy between the NME and NFE error. This is because the long-term annual mean field accounts for the majority of the spatial structure of the field. Hence, the NME error is uninformative with respect to the performance of reconstruction methods. In terms of the NFE error the POD based reconstruction techniques is shown to fail to reconstruct the high-dimensional flow field using limited sensor measurements. In contrast, the shallow decoder demonstrates an excellent reconstruction performance both using $32$ and $64$ measurements. Figure 6 shows visual results to support these quantitative findings.

4.3 Turbulent flow using sub-gridscale measurements

The final example we consider is the velocity field of a turbulent isotropic flow. Unlike the previous examples, the isotropic turbulent flow is non-periodic in time and highly non-stationary. Thus, this dataset poses a challenging task. Here, we consider data from a forced isotropic turbulence flow generated with a direct numerical simulation using $1,024^{3}$ points in a triply periodic $\left[0,2\pi\right]^{3}$ domain. For the following experiments, we are using $800$ snapshots for training and $200$ snapshots for validation. The data spread across about one large-eddy turnover time. The data is provided as part of the Johns Hopkins Turbulence Database [44].

If the sensor measurements $\bm{s}$ are acquired on a coarse but regular grid, then the reconstruction task may be considered as a super-resolution problem [67, 29, 13]. There are a number of direct applications of super-resolution in fluid mechanics centered around sub-gridscale modeling. Because many fluid flows are inherently multiscale, it may be prohibitively expensive to collect data that captures all spatial scales, especially for iterative optimization and real-time control [11]. Inferring small-scale flow structures below the spatial resolution available is an important task in large eddy simulation (LES), climate modeling, and particle image velocimetry (PIV), to name a few applications. Deep learning has recently been employed for super-resolution in fluid mechanics applications with promising results [30]. Note that our setting differs from the super-resolution problem. Here, we obtain first a low-resolution image by applying a mean filter to the high-dimensional snapshot. Then, we use a single sensor measurement per grid cell to form the inputs (illustrated in Figure 7(b)). In contrast, super-resolution uses the low-resolution image as input.

First, we consider the within sample prediction task. In this case, we yield excellent results for the estimated high-dimensional flow fields, despite the challenging problem. Table 5 quantifies the performance for varying numbers of sub-gridscale measurements. In addition, Figure 7 provides some visual evidence for the good performance for this problem.

Next, we illustrate the limitation of the shallow decoder. Indeed, it is important to stress that the SD cannot be used for “out of sample prediction tasks” if the fluid flow is highly non-stationary. To illustrate this issue, Figure 8 shows three flow fields at different temporal locations. First, Figure 8(b) shows a test example, which is close in time to the training set. In this case, the SD is able to reconstruct the flow field with high accuracy. The reconstruction quality drops for snapshots which are further away in time, as shown in Figure 8(d). Finally, Figure 8(f) shows that reconstruction fails if the test example is far away from the training set in time, i.e., the flow field is not drawn from the same statistical distribution as the training examples are.

5 Discussion

The emergence of sensor networks for global monitoring (e.g., ocean and atmospheric monitoring) requires new mathematical techniques that are capable of maximally exploiting sensors for state estimation and forecasting. Emerging algorithms from the machine learning community can be integrated with many traditional scientific computing approaches to enhance sensor network capabilities. For many global monitoring applications, the placement of sensors can be prohibitively expensive, thus requiring learning techniques such as the one proposed here, which can exploit a reduction in the number of sensors while maintaining required performance characteristics.

To partially address this challenge, we proposed a shallow decoder with two hidden layers for the problem of flow reconstruction. The mathematical formulation presented is significantly different from what is commonly used in flow reconstruction problems, e.g., gappy interpolation with dominant POD modes. Indeed, our experiments demonstrate the improved the enhanced robustness and accuracy of fluid flow field reconstruction by using our shallow decoder.

Future work aims to leverage the underlying laws of physics in flow problems to further improve the efficiency. In the context of flow reconstruction or, more generally, observation of a high-dimensional physical system, insights from the physics at play can be exploited [58]. In particular, the dynamics of many systems do indeed remain low-dimensional and the trajectory of their state vector lies close to a manifold whose dimension is significantly lower than the ambient dimension. Moreover, the features exploited from the shallow decoder network can also be integrated in reduced order models (ROMs) for forecasting predictions [9]. In many high-dimensional systems where ROMs are used, the ability to generate low-fidelity models that can be rapidly simulated has revolutionized our ability to model such complex systems, especially in application of complex flow fields. The ability to rapidly generate low-rank feature spaces alternative to POD generates new possibilities for ROMs using limited sampling and limited data. This aspect of the shallow decoder will be explored further in future work.

Acknowledgments

LM gratefully acknowledges the support of the French Agence Nationale pour la Recherche (ANR) and Direction Générale de l’Armement (DGA) via the FlowCon project (ANR-17-ASTR-0022). SLB acknowledges support from the Army Research Office (ARO W911NF-17-1-0422). JNK acknowledges support form the Air Force Office of Scientific Research (FA9550-19-1-0011). LM and JNK also acknowledge support from the Air Force Office of Scientific Research (FA9550-17-1-0329). MWM would like to acknowledge ARO, DARPA, NSF, and ONR for providing partial support for this work. We would also like to thank Kevin Carlberg for valuable discussions about flow reconstruction techniques.

Appendix A Hyper-parameter search for the POD based methods

In the following, we provide results of our hyper-parameter search for determining the optimal tuning parameters for flow reconstruction. We proceed by evaluating the reconstruction error of the pod and pod plus method for a plausible range of values. Here, we consider hard-threshold regularization for the pod method and ridge regularization for the pod plus method. We run $30$ trails of the experiment, where we use a unique sensor location configuration at each trial.

Figure 9 shows the results for the fluid flow past the cylinder. First, we show the results for the pod plus method in (a). Regularizing the solution improves the reconstruction accuracy and the effect of regularization on the reconstruction error is pronounced for an increasing number of sensors. (Note, that at the same time, while the reconstruction error is decreasing with an increasing numbers of sensors, finding the optimal tuning parameter becomes more difficult.) Next, we show the results for pod with hard-threshold regularization in (b). It can be seen, that the performance is on par with ridge regularization (plotted as black dashed line), where hard-threshold regularization shows to have a lower variance compared to ridge regularization. In contrast, the shallow decoder outperforms both the pod and the pod plus method, represented by a dashed read line. However, the performance gap between the POD-based methods and the shallow decoder is closing for an increased number of sensors. This is not surprising, since the flow past the cylinder represents a relatively simple problem where the pod method is known to provide good reconstruction results, given a sufficient large number of sensors.

Figure 10 and 11 show the results for the noisy flow past the cylinder and for the sea surface temperature data. Again, it can be seen that both ridge regularization and hard-threshold regularization performs on par, while the shallow decoder outperforms the POD-based methods.

Appendix B Singular spectrum analysis of reconstructed data

Here we provide additional results that show the singular value spectrum of the reconstructed training and test data. As reference we also show the spectrum of the ground truth data. Figure 12 shows the results for (a) the fluid flow behind the cylinder, (b) the sea surface data, and (c) the turbulent flow. The singular value spectrum of the reconstructed data helps us to compare the performance between the POD-based method and our shallow decoder.

For all problems that we consider, it can be seen that the shallow decoder captures more fine-scale information as compared to the truncated POD-based method. Note, that we consider the case where the training and test data are sampled from the same distribution.

Appendix C Setup for our empirical evaluation

Here, we provide details about the concrete network architectures of the shallow decoder, which are used for the different examples. The networks are implemented in Python using PyTorch; and research code for flow behind the cylinder is available via https://github.com/erichson/ShallowDecoder. Tables 6– 8 show the details. For each example we use a similar architecture design. The difference is that we use a slightly wider design (more neurons per layer) for the SST dataset and the isotropic flow. That is because we are using a larger number of sensors for these two problems, and thus we need to increase the capacity of the network. In each situation, the learning rate is set to $1\text{e-}{2}$ with a scheduled decay rate of $0.3$ . Further, we use a small amount of weight decay $\lambda=1\text{e-}{7}$ to regularize the network.

Bibliography72

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Adler and Öktem [2017] Jonas Adler and Ozan Öktem. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems , 33(12):124007, 2017.
2Agostinelli et al. [2014] Forest Agostinelli, Matthew Hoffman, Peter Sadowski, and Pierre Baldi. Learning activation functions to improve deep neural networks. ar Xiv preprint ar Xiv:1412.6830 , 2014.
3Allen-Zhu et al. [2019] Zeyuan Allen-Zhu, Yuanzhi Li, and Yingyu Liang. Learning and generalization in overparameterized neural networks, going beyond two layers. In Advances in Neural Information Processing Systems , pages 6155–6166, 2019.
4Azencot et al. [2020] Omri Azencot, N Benjamin Erichson, Vanessa Lin, and Michael W Mahoney. Forecasting sequential data using consistent koopman autoencoders. ar Xiv preprint ar Xiv:2003.02236 , 2020.
5Baraniuk [2007] Richard G Baraniuk. Compressive sensing. IEEE Signal Processing Magazine , 24(4):118–121, 2007.
6Barrault et al. [2004] Maxime Barrault, Yvon Maday, Ngoc Cuong Nguyen, and Anthony T Patera. An “empirical interpolation’ method: Application to efficient reduced-basis discretization of partial differential equations. Comptes Rendus Mathematique , 339(9):667–672, 2004.
7Bartlett et al. [2017] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems , pages 6240–6249, 2017.
8Belkin et al. [2019] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences , 116(32):15849–15854, 2019.