Sub-grid scale model classification and blending through deep learning

Romit Maulik; Omer San; Jamey D. Jacob; Christopher Crick

arXiv:1812.11949·physics.flu-dyn·May 15, 2019

Sub-grid scale model classification and blending through deep learning

Romit Maulik, Omer San, Jamey D. Jacob, Christopher Crick

PDF

TL;DR

This paper introduces a machine learning framework for classifying and blending turbulence models in large eddy simulations, enabling adaptive and hybrid modeling based on local flow conditions.

Contribution

It presents a novel approach to dynamically classify and blend turbulence models using deep learning, improving LES accuracy and stability for complex turbulent flows.

Findings

01

Effective classification of turbulence models based on flow features.

02

Successful blending of models for improved flow simulation.

03

Robust and stable closure achieved in Kraichnan turbulence test case.

Abstract

In this article we detail the use of machine learning for spatiotemporally dynamic turbulence model classification and hybridization for the large eddy simulations (LES) of turbulence. Our predictive framework is devised around the determination of local conditional probabilities for turbulence models that have varying underlying hypotheses. As a first deployment of this learning, we classify a point on our computational grid as that which requires the functional hypothesis, the structural hypothesis or no modeling at all. This ensures that the appropriate model is specified from \emph{a priori} knowledge and an efficient balance of model characteristics is obtained in a particular flow computation. In addition, we also utilize the conditional probability predictions of the same machine learning to blend turbulence models for another hybrid closure. Our test-case for the demonstration…

Tables2

Table 1. Table 1: Classification accuracy percentages for different grid-resolutions in a priori to illustrate how accurately our base learning can predict correct labels. Accuracies are seen to drop when resolutions are coarsened radically. However, some learning is retained as evidenced by accuracies greater than 33%.

Time	$N^{2} = 512^{2}$	$N^{2} = 256^{2}$	$N^{2} = 128^{2}$	$N^{2} = 64^{2}$	$N^{2} = 32^{2}$
$t = 1$	65.77	63.04	56.51	52.17	47.65
$t = 2$	60.89	60.47	61.02	55.62	41.99
$t = 3$	68.05	65.08	61.54	53.32	46.29
$t = 4$	63.93	66.04	60.24	48.33	48.54

Table 2. Table 2: Classification percentages in a priori and a posteriori . One can see deviation from trends due to numerical error accumulation (and greater utilization of closure classifications for subsequent stabilization).

Time	A priori			A posteriori
	AD	SM	No-model	AD	SM	No-model
$t = 1$	22.43	21.69	55.87	29.94	26.34	43.72
$t = 2$	22.31	21.08	56.60	29.17	25.37	45.45
$t = 3$	21.37	20.84	57.78	28.68	25.07	46.25
$t = 4$	19.49	22.56	57.94	28.45	25.38	46.17

Equations41

\frac{\partial ω}{\partial t} + J (ω, ψ) = \frac{1}{R e} \nabla^{2} ω, x, y \in [0, 2 π], t \in [0, 4],

\frac{\partial ω}{\partial t} + J (ω, ψ) = \frac{1}{R e} \nabla^{2} ω, x, y \in [0, 2 π], t \in [0, 4],

J (ω, ψ) = \frac{\partial ω}{\partial x} \frac{\partial ψ}{\partial y} - \frac{\partial ω}{\partial y} \frac{\partial ψ}{\partial x},

J (ω, ψ) = \frac{\partial ω}{\partial x} \frac{\partial ψ}{\partial y} - \frac{\partial ω}{\partial y} \frac{\partial ψ}{\partial x},

\nabla^{2} ψ = - ω .

\nabla^{2} ψ = - ω .

\frac{\partial ω ˉ}{\partial t} + J (\overset{ω}{ˉ}, \overset{ˉ}{ψ}) \nabla^{2} \overset{ˉ}{ψ} = \frac{1}{R e} \nabla^{2} \overset{ω}{ˉ} + Π, = - \overset{ω}{ˉ},

\frac{\partial ω ˉ}{\partial t} + J (\overset{ω}{ˉ}, \overset{ˉ}{ψ}) \nabla^{2} \overset{ˉ}{ψ} = \frac{1}{R e} \nabla^{2} \overset{ω}{ˉ} + Π, = - \overset{ω}{ˉ},

Π = J (\overset{ω}{ˉ}, \overset{ˉ}{ψ}) - \overline{J (ω, ψ)} .

Π = J (\overset{ω}{ˉ}, \overset{ˉ}{ψ}) - \overline{J (ω, ψ)} .

Π = ν_{e} \nabla^{2} \overset{ω}{ˉ}

Π = ν_{e} \nabla^{2} \overset{ω}{ˉ}

\displaystyle\begin{gathered}\nu_{e}=(C_{s}\delta)^{2}|\bar{S}|,\\ |\bar{S}|=\sqrt{4\Big{(}\frac{\partial^{2}\bar{\psi}}{\partial x\partial y}\Big{)}^{2}+\Big{(}\frac{\partial^{2}\bar{\psi}}{\partial x^{2}}-\frac{\partial^{2}\bar{\psi}}{\partial y^{2}}\Big{)}^{2}},\end{gathered}

\displaystyle\begin{gathered}\nu_{e}=(C_{s}\delta)^{2}|\bar{S}|,\\ |\bar{S}|=\sqrt{4\Big{(}\frac{\partial^{2}\bar{\psi}}{\partial x\partial y}\Big{)}^{2}+\Big{(}\frac{\partial^{2}\bar{\psi}}{\partial x^{2}}-\frac{\partial^{2}\bar{\psi}}{\partial y^{2}}\Big{)}^{2}},\end{gathered}

Π = J (\overset{ω}{ˉ}, \overset{ˉ}{ψ}) - J (ω^{*}, ψ^{*})

Π = J (\overset{ω}{ˉ}, \overset{ˉ}{ψ}) - J (ω^{*}, ψ^{*})

ν_{e}^{a} = \frac{Π}{\nabla ^{2} ω ˉ}

ν_{e}^{a} = \frac{Π}{\nabla ^{2} ω ˉ}

M : {p_{1}, p_{2}, \dots, p_{P}} \in R^{P} \to {q_{1}, q_{2}, \dots, q_{Q}} \in R^{Q} .

M : {p_{1}, p_{2}, \dots, p_{P}} \in R^{P} \to {q_{1}, q_{2}, \dots, q_{Q}} \in R^{Q} .

M : {p} \in R^{18} \to {P (q ∣ p)} \in R^{3},

M : {p} \in R^{18} \to {P (q ∣ p)} \in R^{3},

p_{i, j} = {\overset{ω}{ˉ}_{i, j}, \overset{ω}{ˉ}_{i, j + 1}, \overset{ω}{ˉ}_{i, j - 1}, \dots, \overset{ω}{ˉ}_{i - 1, j - 1}, \overset{ˉ}{ψ}_{i, j}, \overset{ˉ}{ψ}_{i, j + 1}, \overset{ˉ}{ψ}_{i, j - 1}, \dots, \overset{ˉ}{ψ}_{i - 1, j - 1}}, q_{i, j} = {P (Π_{i, j}^{k} ∣ p_{i, j})},

p_{i, j} = {\overset{ω}{ˉ}_{i, j}, \overset{ω}{ˉ}_{i, j + 1}, \overset{ω}{ˉ}_{i, j - 1}, \dots, \overset{ω}{ˉ}_{i - 1, j - 1}, \overset{ˉ}{ψ}_{i, j}, \overset{ˉ}{ψ}_{i, j + 1}, \overset{ˉ}{ψ}_{i, j - 1}, \dots, \overset{ˉ}{ψ}_{i - 1, j - 1}}, q_{i, j} = {P (Π_{i, j}^{k} ∣ p_{i, j})},

E (w) = - n = 1 \sum N k = 1 \sum K {t_{nk} lo g (y_{nk}) + (1 - t_{nk}) lo g (1 - y_{nk})},

E (w) = - n = 1 \sum N k = 1 \sum K {t_{nk} lo g (y_{nk}) + (1 - t_{nk}) lo g (1 - y_{nk})},

Π_{i, j}^{M L} = Π_{i, j}^{k} s.t. k argmax P (Π_{i, j}^{k} ∣ p_{i, j})

Π_{i, j}^{M L} = Π_{i, j}^{k} s.t. k argmax P (Π_{i, j}^{k} ∣ p_{i, j})

Π_{i, j}^{M L} = P (Π_{i, j}^{A D} ∣ p_{i, j}) Π_{i, j}^{A D} + P (Π_{i, j}^{S M} ∣ p_{i, j}) Π_{i, j}^{S M}

Π_{i, j}^{M L} = P (Π_{i, j}^{A D} ∣ p_{i, j}) Π_{i, j}^{A D} + P (Π_{i, j}^{S M} ∣ p_{i, j}) Π_{i, j}^{S M}

S_{ω} = ⟨ ∣ \overset{ω}{ˉ} (x+r) - \overset{ω}{ˉ} (x) ∣^{2} ⟩,

S_{ω} = ⟨ ∣ \overset{ω}{ˉ} (x+r) - \overset{ω}{ˉ} (x) ∣^{2} ⟩,

T K E = μ (u_{f}^{2} + v_{f}^{2}),

T K E = μ (u_{f}^{2} + v_{f}^{2}),

u_{f} = \overset{u}{ˉ} - μ (\overset{u}{ˉ})

u_{f} = \overset{u}{ˉ} - μ (\overset{u}{ˉ})

v_{f} = \overset{v}{ˉ} - μ (\overset{v}{ˉ}),

\overset{u}{ˉ}

\overset{u}{ˉ}

σ^{2} (\overset{ω}{ˉ}) = μ ((\overset{ω}{ˉ} - μ (\overset{ω}{ˉ}))^{2}) .

σ^{2} (\overset{ω}{ˉ}) = μ ((\overset{ω}{ˉ} - μ (\overset{ω}{ˉ}))^{2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Sub-grid scale model classification and blending through deep learning

Romit Maulik\aff1

Omer San\aff1 \corresp

[email protected]

Jamey D. Jacob\aff1

Christopher Crick\aff2

\aff1 Mechanical & Aerospace Engineering, Oklahoma State University, Stillwater, OK 74078, USA \aff2 Computer Science, Oklahoma State University, Stillwater, OK 74078, USA

Abstract

In this article we detail the use of machine learning for spatiotemporally dynamic turbulence model classification and hybridization for the large eddy simulations (LES) of turbulence. Our predictive framework is devised around the determination of local conditional probabilities for turbulence models that have varying underlying hypotheses. As a first deployment of this learning, we classify a point on our computational grid as that which requires the functional hypothesis, the structural hypothesis or no modeling at all. This ensures that the appropriate model is specified from a priori knowledge and an efficient balance of model characteristics is obtained in a particular flow computation. In addition, we also utilize the conditional probability predictions of the same machine learning to blend turbulence models for another hybrid closure. Our test-case for the demonstration of this concept is given by Kraichnan turbulence which exhibits a strong interplay of enstrophy and energy cascades in the wave number domain. Our results indicate that the proposed methods lead to robust and stable closure and may potentially be used to combine the strengths of various models for complex flow phenomena prediction.

1 Introduction

Turbulence is an active area of research due to its significant impact on a diverse set of challenges such as those pertaining to the aerospace and geophysical communities. In recent decades, computational fluid dynamics (CFD) has proven to be useful for low-cost realizations of flow phenomena for critical decision making processes. However, CFD is still fairly limited in terms of accuracy due to the exceptional computational expense involved in high-fidelity simulations of turbulence. ‘True’ numerical experiments require the use of a direct numerical simulation (DNS) of the Navier-Stokes equations. DNS is only possible if a discretized domain can resolve all possible frequencies in a flow and is therefore out of reach of the vast majority of engineering and geophysical applications for the foreseeable future. Large eddy simulations (LES) (Sagaut, 2006) have proven to be a promising strategy for resolving a greater number of scales in a flow but require the specification of a model which represents the interactions of the higher frequencies with the mean flow. This sub-grid scale (SGS) model, also known as a closure, is usually specified in the form of an algebraic or differential equation and is generally flow category specific (Vreman, 2004). This imposes a caveat on the applicability of a SGS model if no a priori information of the flow category is known. As such, the basic premise of LES is extendable to many partial differential equation systems with quadratic non-linearities. In this paper, we explore the utilization of machine learning for dynamically inferring regions where a particular turbulence modeling hypothesis is applicable with the goal of improving predictive capabilities of turbulence dynamics for a wide range of problems.

The multi-scale nature of turbulence requires the use multiple modeling approximations for the higher wavenumbers which remain unsupported by computational degrees of freedom (a case for most flows of any practical interest). The procedure of modeling these smaller scales is often denoted closure due to insufficient knowledge about higher-order wavenumber interactions with the coarse-grained system (Berselli et al., 2006) and remains vital for the accurate computation of many applications (Hickel et al., 2014; Yu et al., 2016). From an LES point of view, the closure problem may be considered to be dominated by commutative errors in the calculation of the non-linear term as well as the defects associated with commutative errors stemming from the dynamic term. In this study, we focus on the former.

Explicit LES argues for the utilization of sub-grid models specified as algebraic or differential equations for the unresolved scales. These are built on an intuitive reasoning of how the losses of coarse-graining the Navier-Stokes equations may be incorporated into an LES deployment. Some of the most notable sub-grid closure strategies are those given by the linear eddy-viscosity hypothesis, which models the sub-grid stress tensor through the Bousinessq approximation. Within the context of the Navier-Stokes equations, it is generally accepted that the vorticity dominated smaller scales are dissipative (Kolmogorov, 1941) and therefore, most turbulence models seek to specify a sub-grid dissipation (Frisch, 1995). Many functional sub-grid models can be traced back to Smagorinsky (1963), where an effective eddy-viscosity was determined by an a priori specified mixing length and a $k^{-5/3}$ scaling recovery for the kinetic energy content in the wavenumber domain. Similar hypotheses have also been used for two-dimensional turbulence (often utilized as a test-bed for geophysical scenarios, for instance see McWilliams (1990); Tabeling (2002); Boffetta & Ecke (2012); Pearson et al. (2017); Pearson & Fox-Kemper (2018)), for approximating the $k^{-3}$ cascade and generally have their roots in dimensional analysis related to the cascade of enstrophy (Leith, 1968). These models may also be classified as functional due to the phenomenological nature of their deployment and comprise the bulk of explicit LES turbulence models used in practical deployments. Explicit LES closures may also be specified through the specification of a low-pass spatial filter to account for the unresolved scales (Bardina et al., 1980; Stolz & Adams, 1999; Layton & Lewandowski, 2003; Mathew et al., 2003; San & Vedula, 2018) where phenomenology is bypassed but ansatz are provided for the bulk dissipative nature of the smaller scales through the control of a characteristic filter-width. In either scenario, (i.e., whether structural or functional), the choice of the phenomenology (or dissipation control parameter) plays a key role in the successful calculation of accurate a posteriori statistics.

The past few years have seen a rapid increase in the use of machine learning for various scientificand engineering applications. For turbulence, some widely used strategies for prediction and inference include symbolic regression such as in Weatheritt & Sandberg (2016, 2017a, 2017b), where functional model-forms for Reynolds-averaged Navier-Stokes (RANS) deployments were generated through evolutionary optimization against high-fidelity data. Other techniques incorporating Bayesian ideologies have also been used, for instance in Xiao et al. (2016) where an iterative ensemble Kalman method was used to assimilate prior data for quantifying model form uncertainty in RANS models. In Wang et al. (2017b, a) and Wu et al. (2018), random-forest regressors were utilized for RANS turbulence-modeling given direct numerical simulation (DNS) data. In Singh & Duraisamy (2016) and Singh et al. (2017), an ANN was utilized to predict a non-dimensional correction factor in the Spalart-Allmaras turbulence model through a field-inversion process using experimental data. Bypassing functional formulations of a turbulence model was also studied from the RANS point of view by Tracey et al. (2015). Ling & Templeton (2015) utilized support vector machines, decision trees and random forest regressors for identifying regions of high RANS uncertainty. A deep-learning framework where Reynolds-stresses would be predicted in an invariant subspace was developed by Ling et al. (2016b). Machine learning of invariance properties has also been discussed in the context of turbulence modeling by Ling et al. (2016a). The reader is directed to a recent review by Duraisamy et al. (2018), for an excellent review of turbulence modeling using data-driven ideas.

As shown above, the use of data-driven ideologies and in particular artificial neural networks (ANNs) has generated significant interest in the turbulence modeling community for addressing long-standing challenges (also see Sotgiu et al. (2018); Zhu et al. (2019); Zhang et al. (2019) for recent examples). A multilayered ANN may be optimally trained to approximate any non-linear function (Hornik et al., 1989) and the large data sets involved in turbulence research couple with ever-improving computing capabilities has also motivated the study of ANN based learning. Within the context of LES (and associated with the scope of this paper) there are several investigations into sub-grid modeling using data-driven techniques. In an early study of the feasibility of using learning from DNS, Sarghini et al. (2003) deployed ANNs for estimating Smagorinsky model-form coefficients within a mixed sub-grid model for a turbulent channel flow. ANNs were also used for wall-modeling by Milano & Koumoutsakos (2002) where it was used to reconstruct the near wall field and compared to standard proper-orthogonal-decomposition techniques. An alternative to ANNs for sub-grid predictions was proposed by King et al. (2016) where a priori optimization was utilized to minimize the $L^{2}$ -error between true and modeled sub-grid quantities using a parameter-free Volterra series. Maulik & San (2017a) utilized an extreme-learning-machine (a variant of a single-layered ANN) to obtain maps between low-pass spatially filtered and deconvolved variables in an a priori sense. This had implications for the use of ANNs for turbulence modeling without model-form specification. A more in-depth investigation was recently undertaken by Fukami et al. (2018) where convolutional ANNs were utilized for reconstructing from downsampled snapshots of turbulence. Maulik et al. (2018) also deployed a data-driven convolutional and deconvolutional operation to obtain closure terms for two-dimensional turbulence. Gamahara & Hattori (2017) utilized ANNs for identifying correlations with grid-resolved quantities for an indirect method of model-form identification in turbulent channel flow. The study by Vollant et al. (2017) utilized ANNs in conjuction with optimal estimator theory to obtain functional forms for sub-grid stresses. In Beck et al. (2018), a variety of neural network architectures such as convolutional and recurrent neural networks are studied for predicting closure terms for decaying homogeneous isotropic turbulence. A least-squares based truncation is specified for stable deployments of their model-free closures. Model-free turbulence closures are also specified by Maulik et al. (2018, 2019) and Wang et al. (2018), where sub-grid scale stresses are learned directly from DNS data and deployed in a posteriori assessments. King et al. (2018) studied generative-adversarial networks and the LAT-NET Hennigh (2017) for a priori recovery of statistics such as the intermittency of turbulent fluctuations and spectral scaling.

While a large majority of the LES-based frameworks presented above utilize a least-squares error minimization technique for constructing maps to sub-grid stresses directly for theoretically optimal LES (Langford & Moser, 1999; Moser et al., 2009; LaBryer et al., 2015), this work is novel in that it utilizes sub-grid statistics (pre-computed from DNS data) to train a classifier. Our trained intelligence utilizes the most appropriate turbulence modeling modelling hypothesis (i.e., either structural or functional) from a priori experience to close the LES governing equations. It is also deployed to blend turbulence models linearly at each point during flow evolution for a novel hybrid closure. In this manner, we are able to co-deploy models having fundamentally different underlying hypothesis for turbulence parameterizations in a stable manner. This is similar to the study employed in Ling & Kurzawski (2017) where machine learning is utilized for adaptive error corrections in RANS deployments. In the rest of this article, we discuss the governing equations of decaying Kraichnan turbulence, introduce our machine learning architecture and its optimization and detail its a priori and a posteriori performance through statistical assessments.

2 Governing equations

We proceed by outlining our Kraichnan turbulence test-case which (alongwith quasigeostrophic turbulence) is an important prototype for geophysical flow-phenomenon with high aspect ratios and for which turbulence model research remains highly active (Pearson et al., 2017). The governing equations of motion for Kraichnan turbulence are given by the two-dimensional Navier-Stokes equations in a periodic domain. The non-dimensionalized version of these equations may be expressed in the vorticity ( $\omega$ ) and stream function ( $\psi$ ) formulation as (San & Staples, 2012),

[TABLE]

where we define the Jacobian (or the nonlinear term as)

[TABLE]

and the conservation of mass is enforced by

[TABLE]

A measure of multi-scale behavior in this system is given by the Reynolds number ( $Re$ ). A high value of $Re$ combined with a coarse-grid projection of these equations results in insufficient support for the finest structures in the flow evolution, leading to noise accumulation at grid cut-off and potential floating point overflow of the numerical evolution of this problem. A sufficiently coarse-grained representation of the governing equations introduced previously are given by the LES governing equations

[TABLE]

where $\Pi$ may be assumed to be the perfect closure given by

[TABLE]

When adequately simulated, the Kraichnan turbulence test cases results in the classical $k^{-3}$ scaling of the energy spectra (Kraichnan, 1967). In practice, this perfect estimation of loss is never available in a numerical deployment and must be estimated by either an algebraic or differential equation. We focus on two competing ideologies for estimating closure. The first is given by the functional hypothesis and may be expressed as

[TABLE]

where the Smagorinsky approximation to the eddy-viscosity $\nu_{e}$ is given by

[TABLE]

where it is very common to consider the filter length scale $\delta$ as the representative mesh size. A successful application of this closure necessitates a dynamic calculation of the Smagorinsky coefficient $C_{s}$ that requires the specification of a test-filter and a spatial-averaging for stabilized deployment. This approach is the well-known dynamic Smagorinsky (DS) closure (Germano et al., 1991) and its two-dimensional abstraction for Kraichnan turbulence has been presented by San (2014).

A competing ideology is given by the structural (or scale-similarity) hypothesis which assumes that the LES equations are projections of the Navier-Stokes equations to a smoother space where an inverse-filtering operator may be utilized to recover the finer scales that are lost. Mathematically,

[TABLE]

where $\omega^{*}$ and $\psi^{*}$ are approximately deconvolved variables obtained through an inverse-filtering procedure and a Gaussian-type filter kernel (given by the $G(a)=\tilde{a}$ ) is an approximation of the projection to the LES space. However, these techniques are limited due to the underlying assumption of isomorphism between the LES and the Navier-Stokes equations (Germano, 2015). In practice, this implies that structural hypotheses are appropriate only if finer structures are sufficiently well-resolved on a particular grid. As such, this diminishes their benefit for practical flows where grid cut-off wave numbers are generally much smaller than the largest wave number in the flow. The breakdown of structural closures manifests itself in the form of stability issues. For this reason, many successful closure deployments utilize linear combinations of structural and functional models (Habisreutinger et al., 2007). In this work, we implement approximate-deconvolution (AD) (Stolz & Adams, 1999) which utilizes an iterative application of the trapezoidal filter kernel for inversion of filtered grid-quantities and utilize three iterative resubstitutions to deconvolve our grid-resolved variables.

3 Machine learning

We now turn to procedure of utilizing DNS data for learning when to switch between one of three closure scenarios. Of these three options, two are given by the choice of the functional hypothesis and AD. The third option is that of a no-model scenario where our learning determines that closure-modeling is unnecessary. The third scenario is retained since there is a possibility for localized areas in a flow resolution to have adequate grid-support so that the contributions of the sub-grid scale become negligible. This switching between scenarios is spatio-temporally dynamic. Before proceeding, we note that the functional deployment eschews the dynamic procedure and simply sets a $C_{s}=1.0$ for the calculation of the eddy-viscosity as utilized in the standard Smagorinsky model (Smagorinsky, 1963) as is common in geophysical scenarios (Cushman-Roisin & Beckers, 2011). This ensures that the proposed framework remains computationally tractable, easily interpretable and deployable. We proceed by outlining our learning strategy through the utilization of DNS data. Five snapshots of DNS data at $Re=32000$ and at $2048^{2}$ degrees of freedom (from 40000 available snapshots) are utilized to compute the grid-filtered variables (herein denoted by FDNS) at $256^{2}$ degrees of freedom through the application of a spectral cut-off filter. Perfect closure values ( $\Pi$ ) are then obtained (the reader is directed to Maulik et al. (2019) for details related to the calculation of these quantities). Figure 1 visually quantifies the effect of the spectral domain filtering where the FDNS of a snapshot of vorticity is shown. We then introduce the a priori eddy-viscosity given by

[TABLE]

where all the terms on the right-hand side of the above equation are available through calculation from the DNS snapshots. The a priori eddy viscosity is centered at a value of zero (corresponding to a region where closure modeling is unnecessary) and has tails in the negative and positive directions (a hallmark of isotropic turbulence). A core component of the hypothesis in this work stems from the fact that structural hypotheses are not limited to positive eddy-viscosity predictions alone. The reader may note that models utilizing the functional hypothesis always lead to positive eddy-viscosities. The a priori eddy viscosities calculated from the DNS data are then projected onto a Gaussian distribution where values lying at a distance of 1% of the standard-deviation from the mean (of zero) are labeled as those requiring no closure (due to the low strength of the a priori eddy-viscosity). Values lying beyond this range are labeled as functional or structural, depending on if they are positive or negative, respectively. This information is encoded in one-hot labeling for a classification deployment and a corresponding schematic for this hypothesis segregation is shown in Figure 2. It is observed that the a large portion of the available data lies within the first standard deviation of the mean eddy-viscosity. This leads to the potential of turbulence modeling classification being considered from outlier detection point-of-view. A factor which motivates the choice of the Gaussian distribution is the nature of Kraichnan turbulence (which is isotropic in nature with Gaussian statistics). However, we note that machine learning algorithms are also capable of classifying data belonging to complex distributions and that this hypothesis segregation may be tuned for better accuracy. Also, the choice of the 1% hyper-parameter is also motivated by observing a posteriori training accuracies where it is noticed that a relatively simple architecture (mentioned next) is efficiently able to discern the varying hypothesis. Values greater than 1% for model delineation led to reduced learning accuracies indicating that a physical delineation potentially exists in this projection and categorization. Further study for adding complexity to the hypothesis segregation is necessary.

Each label for the a priori eddy-viscosity is also associated with an input kernel of grid-resolved quantities. This kernel is given by a stencil of 9 inputs of vorticity and stream function each (for a total of 18 input variables). These 9 inputs of each field are given by a query of the field quantity at a point on the coarse grid, the 4 adjacent points in each dimension and the 4 diagonally adjacent points. Each sample of our training data thus consists of 18 inputs of vorticity and stream function and outputs given by one-hot labels for the choice of closure modeling strategy. In this article, we have leveraged the fact that the mean of vorticity and streamfunction are both very close to zero and do not necessitate normalization. In addition, the non-dimensionalized formulation of the governing equations implies that our inputs are all dimensionless. However, we note that for practical deployments of any local-kernel based machine learning queries, grid-resolved quantities must be normalized and non-dimensionalized.

Mathematically, if our input vector p resides in a $P$ -dimensional space and our desired output q resides in a $Q$ -dimensional space, this framework establishes a map $\mathbb{M}$ as follows:

[TABLE]

Accordingly, the framework utilized in this article leads to the following relation:

[TABLE]

where our input and output spaces are given by

[TABLE]

where $i,j$ refer to the spatial indices on the coarse-grid (i.e., the point of deployment) and $k$ refers to the choice of closure scenario (i.e., structural, functional or no closure). We note here that the choice of the local stencil for ANN query reflects the discretization of the governing equations (with second-order accurate stencils requiring a $\pm 1$ query) and the use of the trapezoidal filter in AD. Also, note that our choice of input space is given by raw variable queries rather than derivatives (or other such engineered terms). This is motivated by an aversion to specify bias towards any particular quantity that may otherwise by learned implicitly by the network. However, we note that the classification workflow may benefit significantly from the inclusion of a feature engineering step prior to optimization. This is a subject of ongoing investigation.

Our optimal map $\mathbb{M}$ is then trained by the following loss-function

[TABLE]

where w are the tunable weights and biases of the network, $N$ is the total number of samples and $K=3$ is the total number of closure scenarios. Here, $t_{nk}$ refers to the target (or true) label of class $k$ and sample $n$ and $y_{nk}$ refers to its corresponding prediction. Note that one-hot encoding ensures that $t_{nk}$ values are always binary Bishop (2006). For reference, our architecture is trained using the open-source deep learning software Tensorflow and is optimized with the use of ADAM, a popular gradient-descent based optimizer.

Our learning architecture is given by a 5 hidden-layer deep neural network with 40 neurons each for calculating the conditional probabilities of the three closure scenarios pointwise in space and time. The hidden layer neurons employ a rectified-linear activation and the output-layer gives us softmax probabilities for the three classes. The scenario with the highest conditional probability is then deployed for model classification i.e.,

[TABLE]

where $\Pi^{ML}_{i,j}$ refers to the machine learning based turbulence model computation at a point. In the case of model blending, the conditional probabilities for closure scenarios are used to find a linear combination of the standard Smagorinsky and the AD closures. In other words,

[TABLE]

where $\Pi^{AD}_{i,j}$ and $\Pi^{SM}_{i,j}$ are AD and Smagorinsky predictions for the turbulence model at a point. We note that the same learning framework is deployed in these two conceptually different scenarios.

The framework is trained using the previously mentioned categorical cross-entropy error minimization for the one-hot encoded targets. A three-fold cross-validation is utilized with a grid search for the number of layers (between 1 to 8) and number of neurons (between 10 to 100 at intervals of 10) to arrive at the optimal architecture mentioned previously. This optimal network is then trained for 2000 epochs to arrive at a classification accuracy of 79% for training and approximately 68% accuracy for validation. Convergence in validation loss was observed at around 1500 epochs as shown in Figure 3. We note here that our validation data (amounting to one-third of the total training data set) was not exposed to the network during gradient calculation in the back-propagation based training procedure. Effectively, our learning is derived from two-thirds of the total training data while our best-model is chosen from that with the lowest validation loss. This is to ensure that the chances of network extrapolation are minimized. The optimal learning is then deployed into a posteriori evolution of the Kraichnan turbulence test case where a pointwise closure deployment is performed for a variety of test cases. We also note that our labeled data is pre-processed to ensure that an equal number of samples are available from each classification regime to prevent our learning from prioritizing one outcome over the other two.

4 Results

We proceed by examining the performance of our framework for various a posteriori deployments which act as a rigorous testing of our learning for both classification and blending. We remind the reader that a posteriori deployments of learning frameworks imply performance assessments in the presence of challenging numerical errors and represent the ultimate test of a data-driven framework. Briefly, the Kraichnan turbulence problem is specified by periodic boundary conditions on a rectangular domain and an initial condition which is given by an energy spectrum in wave number space. In this two-dimensional problem very fine scales are developed quickly and this leads to the classical $k^{-3}$ scaling of the kinetic energy spectra which is a characteristic of the cascade of enstrophy in two-dimensional turbulence. The turbulence then decays gradually over time and thus represents an unsteady closure modeling assessment for our proposed framework.

We assess the viability of the proposed framework through energy spectra calculations of various reduced-order deployments as well as vorticity structure functions obtained from the same. Time histories of the turbulent kinetic energy (denoted $TKE$ ) and the variance of vorticity (denoted $\sigma^{2}(\bar{\omega})$ ) are also plotted for forward deployments. Detailed explanations of the numerical schemes and energy spectra calculations utilized for this problem may be found in Maulik & San (2017b). Briefly, all our spatial numerical schemes are second-order accurate and our time-integration is third-order total-variation diminishing. Our vorticity structure function calculations are given by Grossmann & Mertens (1992):

[TABLE]

where the angle-brackets indicate ensemble averaging and x indicates a position on the grid with r being a certain distance from this location. Our turbulent kinetic energy is given by

[TABLE]

where $u_{f}$ and $v_{f}$ are fluctuating quantities given by

[TABLE]

and where $\mu(a)$ implies the spatial mean of a field variable $a$ . We note that the components of velocity $u,v$ are computed by second-order accurate central finite-difference implementations of

[TABLE]

In a similar manner the variance of vorticity, at each time step, is computed using

[TABLE]

In all the following assessments, the proposed framework is denoted as ML (and specified to be deployed as a classifier or a blender) and it is compared to the AD and DS models. We remind the reader that the framework utilizes the static Smagorinsky model (denoted SM) with $C_{s}=1.0$ within its formulation but is assessed against the DS approach. The reader may note that the value of $C_{s}=1.0$ proves over-dissipative for this particular test case as shown in Maulik et al. (2019).

4.1 Model classification

In this section, we deploy our learning framework as a classifier which spatio-temporally switches between three closure modeling hypotheses during flow-evolution. Figure 4 shows the performance of our proposed framework for the forward deployment of the Kraichnan turbulence problem in the form of energy spectra predictions at $t=4$ . For comparison, no-model results (denoted UNS), the DS method and AD are also shown along with DNS spectra. One can observe that the classifier balances the dissipative natures of the SM and AD hypothesis to obtain a performance similar to the that of the DS approach. While at the lower wavenumbers, the AD procedure seems to be more accurate in statistical capture, the higher wavenumbers are stabilized adequately by the classifier. We would like to note that that SM hypothesis with $C_{s}=1.0$ is highly dissipative and this results shows that the classifier avoids its deployment to a large degree for improved a posteriori performance. We clarify that for spectral cut-off filtering, FDNS spectra and DNS spectra are identical till the grid cut-off wave number (Maulik et al., 2018).

Figure 5 details vorticity structure function assessments in our domain where assessments with FDNS show that the proposed framework is adequately capable of stabilizing turbulence correlations at $t=4$ . We note that the structure functions are predicted more accurately by AD at low values of r whereas the proposed classifier behaves similar to a DS implementation thereby indicating a dynamic dissipation on the grid. It may be so that the adaptive dissipation prioritizes noise removal and thus introduces errors at low values of r as seen through stable structure functions at saturation (i.e., at higher values of r). A further assessment is deployed in the form of time-histories of TKE and $\sigma^{2}(\bar{\omega})$ as shown in Figure 6. Once again, the classifier is seen to have a varying trend in TKE predictions compared to the AD and DS techniques indicating varying dissipation strengths. The vorticity variance predictions are also seen to be balanced between that of the DS and AD models indicating the balance of dissipative tendencies.

We proceed by performing a thorough validation of our learning framework by assessing its performance for prediction tasks that it has not been exposed to in training. This is established by testing closure efficiency for a Reynolds number of 64000. We remind the reader that map optimization was performed solely for $Re=32000$ and this represents an additional validation of the learning. Kinetic energy spectra for this experiment are shown in Figure 7 where it is observed that the classifier performs in a very similar fashion to the $Re=32000$ test-case with AD performing more efficiently at the lower wavenumbers of the inertial range but the ML approach stabilizing high-wavenumber noise effectively. This indicates that the learning has generalized, atleast on the current degree of coarse-graining. We also perform additional assessments such as those shown in Figure 8 and Figure 9. The former shows the vorticity structure function trends for this out-of-training range learning assessment and the latter shows the time-histories of TKE and $\sigma^{2}(\bar{\omega})$ . Very similar trends for both these assessments are obtained when compared to the $Re=32000$ test-case with time varying trends in TKE and vorticity variance capture.

While the aforementioned test-cases validated the learning of the classifier on different control parameters (and flow evolutions) given by the Reynolds number. We proceed by assessing the performance and stability of the classifier on a reduced degree-of-freedom evolution given by $N^{2}=128^{2}$ . This test was to examine if the classifier could retain a viable learning for deployment on slightly different grid support. Figure 10 shows the kinetic energy spectra for a deployment at this reduced degree-of-freedom at a Reynolds number of 32000. It is observed that the proposed classifier is able to avoid inaccuracies related to AD’s lack of dissipation. Indeed, it is well-known that AD requires a sufficiently fine resolution in comparison to the eddy-viscosity hypothesis based models for appropriate utilization of their inverse-filtering (Germano, 2015; Guermond et al., 2004). A similar trend may also be observed in Figure 11 with the vorticity structure functions where once again the AD technique proves accurate at lower distances in comparison the the DS and the ML methods. The ML classifier however is slightly more accurate than the DS approach. The time-histories for TKE and vorticity variance, shown in Figure 12, display a greater amount of variation in the classification framework with TKE values oscillating but remaining close to the DNS results. It must be noted that the no-model and AD hypothesis prevent the classifier from going into a fully SM deployment which is highly dissipative. This explains the similarity with DS results in terms of spectra and vorticity-variance.

In addition to the test-case with a slightly reduced grid-resolution in comparison to training data generation. We also perform a grid-dependence check on the accuracy of our classification framework as shown in Table 1. We perform a hypothesis segregation (as introduced previously) to label all points on a coarse-grid with an optimal closure hypothesis and assess if the learning at $N^{2}=256^{2}$ is able to categorize them appropriately. It can be seen that accuracies around the same resolution as that of the training data are approximately similar to validation accuracy during network optimization. However, on intense coarse-graining, accuracies are seen to drop significantly. However, we note that even at the coarsest resolution of $N^{2}=32^{2}$ , accuracies greater than 33% indicate some form of learning retention.

We also determine the effect of network deployment in the presence of numerical errors as shown in table 2 where it can be seen that a significant difference in hypothesis choices are observed. In particular, the a posteriori deployment of the classifier is seen to utilize a greater proportion of the turbulence closure hypotheses, in comparison to the no-model ones. This may be considered as proof of the classifier detecting greater stabilization requirements due to numerical error build-up. It is observed that the AD approach shows a greater increase in deployment than SM. This may be to offset the rather large inaccuracies of the lower wavenumbers in the exceptionally dissipative SM approach. Understanding the nature of classifier adaptation in the presence of numerical errors is an interesting subject of future research that may aid in improved decision making frameworks. We complement the data in table 2 by outlining the classification percentages of different hypotheses plotted against time for our three a posteriori deployments in Figure 13. One may notice that the deployment of the framework at the coarser resolution of $N^{2}=128^{2}$ requires a higher degree of SM and AD classifications for successful stabilization. All experiments are seen to show a gradual increase in closure requirement as scale-separation grows with subsequent saturation in the gradual decay.

As a final qualitative analysis of our classifier, we plot a posterior contours from forward deployments at $N^{2}=256^{2}$ and $Re=32000$ . In Figure 14, vorticity contours from the ML, DS, AD and UNS simulations are shown to assess the stabilization effect of the different frameworks. The classifier can be seen to stabilize high-wavenumber noise adequately, in a manner similar to DS as previous statistics have reflected. The AD approach may be observed to be contaminated with noise that may potentially be harmful for long-time integration.

4.2 Model blending

In this section, we deploy our learning in a different manner by utilizing their outputs (i.e., the conditional probabilities of each hypothesis) as a pre-multiplier of the prediction of each modeling hypothesis. We utilize this formulation instead of the direct prediction of sub-grid contribution coefficients by observing that a greater degree of stability is imparted to the flow-evolution. Indeed, direct regression with sub-grid quantities has been seen to require a posteriori post-processing for stability (Maulik et al., 2018, 2019) due to energy accummulation in the super-grid (when negative eddy-viscosities are predicted effectively). We recognize (as a limitation), that the utilization of a conditional-probability outputs to linearly combine turbulence modeling predictions from different hypotheses digresses from the core idea of a categorical cross-entropy error minimization. However, as results shall show, the proposed method acts as an effective instrument for blending models in a posteriori with the requirement of any truncation for stability. We would also like to emphasize here that the same learning is applicable for both classification and blending. We perform a similar set of assessments as outlined in Section 4.1.

Figure 15 shows the performance of the blending formulation for a Reynolds number of 32000 and at $t=4$ with $N^{2}=256^{2}$ degrees-of-freedom with kinetic energy spectra. It is observed that the proposed procedure recovers a dissipative behavior that is very similar to the DS approach. It does this by balancing the coefficients of the AD and SM predictions which adapt to the dynamic dissipation requirement of the flow. Overall, it is observed that the framework behaves in a similar manner to the classifier presented previously with dissipation preventing the accummulation of high-wavenumber errors but causing a mismatch in the inertial range spectra capture. However, the dissipation is dynamic and it prevents the overwhelming damping of the SM deployment by balancing with the AD predictions adaptively. This is reflected in Figure 16 as well where the vorticity structure functions once again show that the AD method is more accurate at lower values of r but the blending allows for a prediction akin to the DS technique. Figure 17 shows the time-histories of the TKE and the $\sigma^{2}(\bar{\omega})$ for the proposed framework compared to DS, AD and UNS. The vorticity-variance shows a trend close to the DS approach as expected but the TKE trends are once again not uniform.

In a fashion similar to that employed in section 4.1, we deploy assessments of the blending method to out-of-training predictions for validation. We start with an a poseirori deployment at $Re=64000$ and $N^{2}=256^{2}$ degrees-of-freedom and observe that the learning is sufficiently generalizable. This is observed from Figure 18 where kinetic energy spectra show an aligned prediction to the previous test-case. Vorticity structure functions and time-histories, shown in Figures 19 and 20 respectively show a similar behavior to the one observed for $Re=32000$ . This implies that the learning, whether utilized as a classifier or a blending mechanism, is generalizable. We also deploy the blending framework at a different degree-of-freedom ( $N^{2}=128^{2}$ ) to assess it is stable to a slightly different grid support and trends similar to the classifier are observed wherein the framework focuses on dissipation to stabilize the higher wavenumbers in contrast with AD. This is observed in Figure 21 for kinetic energy spectra and Fig 22 for the vorticity structure functions. Figure 23 shows time-series quantities for this test-case with both TKE and vorticity-variance trends resembling the DS method closely. This also echoes with the performance of the classifier where a coarser-grid resolution led to a performance that was observed to be biased towards the eddy-viscosity hypothesis. However, further studies are necessary to quantify how the model orients itself to compensate for loss of grid-resolution or anisotropies in the flow configuration in a posteriori deployment.

To conclude this section we show qualitative results from the vorticty contours at the final time of the numerical experiments for our proposed framework and their benchmark counterparts in Figure 24. This examination gives us an intuition of the stabilization effect of the proposed framework and it is seen that the predictions are very closely aligned with the DS results. This, once again, validates our dynamic dissipation hypothesis.

5 Conclusions & significance

In this article we have proposed a novel data-driven strategy to dynamically assess the utility of a turbulence modeling hypothesis in an LES framework. This strategy is built on the hypothesis that DNS data may be utilized to assess areas where structural or functional models may be more appropriate in an LES deployment. Our hypothesis segregation and subsequent training culiminates in a learning that may deployed as a classifier of turbulence models at each point on the LES grid as well as a blending technique for balancing turbulence models with different dissipative strengths. When deployed as a classifier, our proposed framework may also predict a ‘no-model’ situation wherein no sub-grid source-term is deployed. When deployed as a blending mechanism, the learning linearly combines the AD and static Smagorinsky hypothesis premultiplied by their respective conditional probabilities to obtain another hybrid dissipation mechanism. Both frameworks utilize the same learning and are assessed through similar experiments in a posteriori.

We have rigorously assessed the deployment of our machine learning strategy through the utilization of a Kraichnan turbulence test case. Our assessments are made for Reynolds number values both within and outside that utilized in training to ensure that a generalizable turbulence closure has been developed. In addition, we have also assessed if the proposed closure can be deployed on a coarser grid than one it was trained for. The dissipative and scale-content capture of the proposed framework is compared to the AD and DS techniques through the use of kinetic-energy spectra, vorticity structure functions and time-histories of TKE and vorticity-variance showing a dynamic dissipation akin to the DS. In particular, the statistical fidelity of the data-driven frameworks is seen to be inferior to the AD technique, which provides better estimates of the kinetic energy spectra at lower wavenumbers and also provides most accurate estimates of the vorticity structure function. However, the focus on high-wavenumber noise attentuation leads to no grid cut-off error accummulation and the statistical results of the ML models are very similar to DS in all assessments. Also, it is observed that the data-driven closure (whether deployed as a classifier or a blending instrument) adequately captures the $k^{-3}$ scaling expected for the kinetic energy spectra for the Kraichnan turbulence case and attempts to strike an optimal balance between the dissipative functional kernel and the noise-prone structural kernel. This behavior is interesting as the model classifies solely between AD and the static Smagorinsky hypothesis indicating the extreme dissipation of the latter at $C_{s}=1.0$ is effectively alleviated by the spatiotemporal blending. Our closure, thus, attempts to blend the strengths of both modeling strategies to overcome their individual weakness while attempting to preserve trends from DNS.

In terms of future opportunities for this idea, the data-driven element of closure identification lends to the potential development of closures that may discern the physical characteristics of different flow scenarios. However, some challenges associated with progress in this research include considerations of invariance properties, which we have identified as a next step. While the computational costs of the proposed framework have not been studied in detail, an efficient deployment of the proposed framework would need graphical processing unit integration of any practical CFD simulation. The latter would lead to efficient learning queries since all the spatial domain information would be available to the common memory. Another future direction identified in this research is the exposure of different two-dimensional turbulence physics to the classification framework to identify if closure choices can ralso be influenced by the training data regime. Success in that regard would allow for ‘train and forget’ closures in problems that have unsteady physics that span fundamentally different turbulence modeling requirements.

Acknowledgement

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0019290. Disclaimer: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Bibliography65

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bardina et al. (1980) Bardina, J., Ferziger, J. H. & Reynolds, W. C. 1980 Improved subgrid-scale models for large-eddy simulation. AIAA Pap. 1980-1357 .
2Beck et al. (2018) Beck, Andrea D, Flad, David G & Munz, Claus-Dieter 2018 Neural Networks for Data-Based Turbulence Models. ar Xiv preprint ar Xiv:1806.04482 .
3Berselli et al. (2006) Berselli, L. C., Iliescu, T. & Layton, W. J. 2006 Mathematics of large eddy simulation of turbulent flows . Springer-Verlag, New York.
4Bishop (2006) Bishop, Christopher M. 2006 Pattern Recognition and Machine Learning (Information Science and Statistics) . Springer-Verlag, Berling Heidelberg.
5Boffetta & Ecke (2012) Boffetta, Guido & Ecke, Robert E 2012 Two-dimensional turbulence. Annu. Rev. Fluid Mech, 44 , 427–451.
6Cushman-Roisin & Beckers (2011) Cushman-Roisin, Benoit & Beckers, Jean-Marie 2011 Introduction to geophysical fluid dynamics: physical and numerical aspects . Academic Press, Boston, MA.
7Duraisamy et al. (2018) Duraisamy, Karthik, Iaccarino, Gianluca & Xiao, Heng 2018 Turbulence modeling in the age of data. Annu. Rev. Fluid Mech. 51 , 357–377.
8Frisch (1995) Frisch, U. 1995 Turbulence . Cambridge University Press, Cambridge, U.K.