A General Probabilistic Approach for Quantitative Assessment of LES   Combustion Models

Ross Johnson; Hao Wu; Matthias Ihme

arXiv:1702.05539·physics.data-an·June 6, 2017

A General Probabilistic Approach for Quantitative Assessment of LES Combustion Models

Ross Johnson, Hao Wu, Matthias Ihme

PDF

TL;DR

This paper introduces the Wasserstein metric as a probabilistic, quantitative validation tool for LES combustion models, capable of evaluating multiple scalar quantities and identifying sources of model deviations.

Contribution

It generalizes the Wasserstein metric for turbulent reacting flows and demonstrates its effectiveness in validating LES combustion models against experimental data.

Findings

01

Wasserstein metric effectively evaluates multiple scalar quantities.

02

The method identifies boundary condition uncertainties and model deficiencies.

03

Application to various datasets shows versatility and robustness.

Abstract

The Wasserstein metric is introduced as a probabilistic method to enable quantitative evaluations of LES combustion models. The Wasserstein metric can directly be evaluated from scatter data or statistical results using probabilistic reconstruction against experimental data. The method is derived and generalized for turbulent reacting flows, and applied to validation tests involving the Sydney piloted jet flame. It is shown that the Wasserstein metric is an effective validation tool that extends to multiple scalar quantities, providing an objective and quantitative evaluation of model deficiencies and boundary conditions on the simulation accuracy. Several test cases are considered, beginning with a comparison of mixture-fraction results, and the subsequent extension to reactive scalars, including temperature and species mass fractions of \ce{CO} and \ce{CO2}. To demonstrate the…

Figures16

Click any figure to enlarge with its caption.

Equations49

Γ min

Γ min

i = 1 \sum n γ_{ij} = i = j \sum n γ_{ij} = 1, γ_{ij} \in {0, 1} .

Γ min

Γ min

i = 1 \sum n γ_{ij} = i = j \sum n γ_{ij} = 1, γ_{ij} \geq 0 .

f (x) = i = 1 \sum n f_{i} δ (x - x_{i}), g (y) = j = 1 \sum n^{'} g_{j} δ (y - y_{i}),

f (x) = i = 1 \sum n f_{i} δ (x - x_{i}), g (y) = j = 1 \sum n^{'} g_{j} δ (y - y_{i}),

δ (x) = {10 if x = 0, if otherwise .

δ (x) = {10 if x = 0, if otherwise .

W_{p} (f, g)

W_{p} (f, g)

j = 1 \sum n^{'} γ_{ij} = f_{i}, i = 1 \sum n γ_{ij} = g_{j}, γ_{i, j} \geq 0 .

f_{n} (x) = \frac{1}{n} i = 1 \sum n δ (x - X_{i}),

f_{n} (x) = \frac{1}{n} i = 1 \sum n δ (x - X_{i}),

∣ W_{p} (f_{n}, g_{n^{'}}) - W_{p} (f, g) ∣ \leq W_{p} (f_{n}, f) + W_{p} (g_{n^{'}}, g),

∣ W_{p} (f_{n}, g_{n^{'}}) - W_{p} (f, g) ∣ \leq W_{p} (f_{n}, f) + W_{p} (g_{n^{'}}, g),

E (W_{2}^{2} (f_{n}, f)) \leq C \times ⎩ ⎨ ⎧ n^{- 1/2} n^{- 1/2} lo g (1 + n) n^{- 2/ d} if d < 4, if d = 4, if d > 4 .

E (W_{2}^{2} (f_{n}, f)) \leq C \times ⎩ ⎨ ⎧ n^{- 1/2} n^{- 1/2} lo g (1 + n) n^{- 2/ d} if d < 4, if d = 4, if d > 4 .

E (W_{2} (f_{n}, g_{n^{'}}) - W_{2} (f, g))^{2} \leq C \times ⎩ ⎨ ⎧ n_{*}^{- 1/2} n_{*}^{- 1/2} lo g (1 + n_{*}) n_{*}^{- 2/ d} if d < 4, if d = 4, if d > 4,

E (W_{2} (f_{n}, g_{n^{'}}) - W_{2} (f, g))^{2} \leq C \times ⎩ ⎨ ⎧ n_{*}^{- 1/2} n_{*}^{- 1/2} lo g (1 + n_{*}) n_{*}^{- 2/ d} if d < 4, if d = 4, if d > 4,

f_{SML} (x)

f_{SML} (x)

\int_{R^{d}} f (x) d x = 1,

\int_{R^{d}} T (x) f (x) d x = \overline{t} .

g (x)

g (x)

T (x)

f_{SML} (x) = k = 1 \prod d \frac{1}{2 π σ _{k}^{2}} exp {- \frac{( x _{k} - μ _{k} ) ^{2}}{2 σ _{k}^{2}}},

f_{SML} (x) = k = 1 \prod d \frac{1}{2 π σ _{k}^{2}} exp {- \frac{( x _{k} - μ _{k} ) ^{2}}{2 σ _{k}^{2}}},

∣ W_{p} (f_{SML}, g_{SML}) - W_{p} (f, g) ∣ \leq W_{p} (f_{SML}, f) + W_{p} (g_{SML}, g) .

∣ W_{p} (f_{SML}, g_{SML}) - W_{p} (f, g) ∣ \leq W_{p} (f_{SML}, f) + W_{p} (g_{SML}, g) .

{\overline{\rho}}{\widetilde{{\cal{D}}}}_{t}{\widetilde{Z}}&=\nabla\cdot({\overline{\rho}}{\widetilde{\alpha}}\nabla{\widetilde{Z}})+\nabla\cdot{\bm{\tau}}^{\rm{res}}_{{\widetilde{Z}}}\;,\\ {\overline{\rho}}{\widetilde{{\cal{D}}}}_{t}{\widetilde{Z^{\prime\prime 2}}}=\nabla\cdot({\overline{\rho}}{\widetilde{\alpha}}\nabla{\widetilde{Z^{\prime\prime 2}}})+\nabla\cdot{\bm{\tau}}^{\rm{res}}_{{\widetilde{Z^{\prime\prime 2}}}}-2{\overline{\rho}}{\widetilde{\boldmath{u}^{\prime\prime}Z^{\prime\prime}}}\cdot\nabla{\widetilde{Z}}-{\overline{\rho}}{\widetilde{\chi}}^{\rm{res}}_{Z}\;,\\ {\overline{\rho}}{\widetilde{{\cal{D}}}}_{t}{\widetilde{C}}=\nabla\cdot({\overline{\rho}}{\widetilde{\alpha}}\nabla{\widetilde{C}})+\nabla\cdot{\bm{\tau}}^{\rm{res}}_{{\widetilde{C}}}+{\overline{\rho}}{\widetilde{\dot{\omega}}}_{C}\;,

{\overline{\rho}}{\widetilde{{\cal{D}}}}_{t}{\widetilde{Z}}&=\nabla\cdot({\overline{\rho}}{\widetilde{\alpha}}\nabla{\widetilde{Z}})+\nabla\cdot{\bm{\tau}}^{\rm{res}}_{{\widetilde{Z}}}\;,\\ {\overline{\rho}}{\widetilde{{\cal{D}}}}_{t}{\widetilde{Z^{\prime\prime 2}}}=\nabla\cdot({\overline{\rho}}{\widetilde{\alpha}}\nabla{\widetilde{Z^{\prime\prime 2}}})+\nabla\cdot{\bm{\tau}}^{\rm{res}}_{{\widetilde{Z^{\prime\prime 2}}}}-2{\overline{\rho}}{\widetilde{\boldmath{u}^{\prime\prime}Z^{\prime\prime}}}\cdot\nabla{\widetilde{Z}}-{\overline{\rho}}{\widetilde{\chi}}^{\rm{res}}_{Z}\;,\\ {\overline{\rho}}{\widetilde{{\cal{D}}}}_{t}{\widetilde{C}}=\nabla\cdot({\overline{\rho}}{\widetilde{\alpha}}\nabla{\widetilde{C}})+\nabla\cdot{\bm{\tau}}^{\rm{res}}_{{\widetilde{C}}}+{\overline{\rho}}{\widetilde{\dot{\omega}}}_{C}\;,

f (x; a, b) = \frac{Γ ( a + b )}{Γ ( a ) Γ ( b )} x^{a - 1} (1 - x)^{(b - 1)},

f (x; a, b) = \frac{Γ ( a + b )}{Γ ( a ) Γ ( b )} x^{a - 1} (1 - x)^{(b - 1)},

{\widehat{a}},\widehat{b}=\text{arg}\,\max\limits_{a,b}\sum_{i=1}^{n}\log\bigg{(}f(x_{i};a,b)\bigg{)}\,.

{\widehat{a}},\widehat{b}=\text{arg}\,\max\limits_{a,b}\sum_{i=1}^{n}\log\bigg{(}f(x_{i};a,b)\bigg{)}\,.

W_{p}(\mu,\,\nu)=\bigg{(}\inf_{\gamma\in\Gamma(\mu,\,\nu)}\int_{M\times M}d(x,\,y)^{p}\,d\gamma(x,\,y)\bigg{)}^{1/p},\,

W_{p}(\mu,\,\nu)=\bigg{(}\inf_{\gamma\in\Gamma(\mu,\,\nu)}\int_{M\times M}d(x,\,y)^{p}\,d\gamma(x,\,y)\bigg{)}^{1/p},\,

W_{p}(\mu,\,\nu)=\bigg{(}\inf_{h\in G(f,\,g)}\int_{\mathbf{R}^{d}}\int_{\mathbf{R}^{d}}d(x,\,y)^{p}h(x,\,y)\,dx\,dy\bigg{)}^{1/p},\,

W_{p}(\mu,\,\nu)=\bigg{(}\inf_{h\in G(f,\,g)}\int_{\mathbf{R}^{d}}\int_{\mathbf{R}^{d}}d(x,\,y)^{p}h(x,\,y)\,dx\,dy\bigg{)}^{1/p},\,

\int_{R^{d}} h (x, y) d y

\int_{R^{d}} h (x, y) d y

\int_{R^{d}} h (x, y) d x

W_{p}(\mu,\,\nu)=\bigg{(}\int_{0}^{1}|F^{-1}(x)-G^{-1}(x)|^{p}dx\bigg{)}^{1/p}\,.

W_{p}(\mu,\,\nu)=\bigg{(}\int_{0}^{1}|F^{-1}(x)-G^{-1}(x)|^{p}dx\bigg{)}^{1/p}\,.

W_{p}(\mu,\,\nu)=\bigg{(}\frac{1}{n}\sum_{i}^{n}|x_{i}^{*}-y_{i}^{*}|^{p}\bigg{)}^{1/p}\,,

W_{p}(\mu,\,\nu)=\bigg{(}\frac{1}{n}\sum_{i}^{n}|x_{i}^{*}-y_{i}^{*}|^{p}\bigg{)}^{1/p}\,,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A General Probabilistic Approach for Quantitative Assessment of LES Combustion Models

Ross Johnson

[email protected]

Hao Wu

[email protected]

Matthias Ihme

[email protected]

Department of Mechanical Engineering, Stanford University, Stanford, CA 94305

Abstract

The Wasserstein metric is introduced as a probabilistic method to enable quantitative evaluations of LES combustion models. The Wasserstein metric can directly be evaluated from scatter data or statistical results using probabilistic reconstruction against experimental data. The method is derived and generalized for turbulent reacting flows, and applied to validation tests involving the Sydney piloted jet flame. It is shown that the Wasserstein metric is an effective validation tool that extends to multiple scalar quantities, providing an objective and quantitative evaluation of model deficiencies and boundary conditions on the simulation accuracy. Several test cases are considered, beginning with a comparison of mixture-fraction results, and the subsequent extension to reactive scalars, including temperature and species mass fractions of \ceCO and \ceCO2. To demonstrate the versatility of the proposed method in application to multiple datasets, the Wasserstein metric is applied to a series of different simulations that were contributed to the TNF-workshop. Analysis of the results allowed to identify competing contributions to model deviations, arising from uncertainties in the boundary conditions and model deficiencies. These applications demonstrate that the Wasserstein metric constitutes an easily applicable mathematical tool that reduce multiscalar combustion data and large datasets into a scalar-valued quantitative measure.

keywords:

Wasserstein metric; Model validation; Statistical analysis; Quantitative model comparison; Large-eddy simulation

††journal: Combust. Flame

1 Introduction
2 Methodology
2.1 Preliminaries: Monge-Kantorovich transportation problem
2.2 Wasserstein metric for discrete distributions
2.3 Non-parametric estimation of $W_{p}$ through empirical distributions
2.4 Statistically most likely reconstruction of distributions
2.5 Calculation procedure
2.6 Remarks on the interpretation and usage of the Wasserstein metric $W_{p}$
3 Configuration
3.1 Experimental setup
3.2 Computational setup and mathematical model
3.3 Statistical results and scatter data
4 Results for application of Wasserstein metric
4.1 Conserved scalar results
4.2 Multiscalar results
5 Quantitative comparison of different simulation results
6 Conclusions
A Wasserstein metric for general probability measures
B Sample code for the evaluation of the multidimensional Wasserstein metric

1 Introduction

A main challenge in the development of models for turbulent reacting flows is the objective and quantitative evaluation of the agreement between experiments and simulations. This challenge arises from the complexity of the flow-field data, involving different thermo-chemical and hydrodynamic quantities that are typically provided from measurements of temperature, speciation, and velocity. This data is obtained from various diagnostics techniques, including nonintrusive methods such as laser spectroscopy and particle image velocimetry, or intrusive techniques such as exhaust-gas sampling or thermocouple measurements [1, 2, 3, 4]. Instead of directly measuring physical quantities, they are typically inferred from measured signals, introducing several correction factors and uncertainties [5]. Depending on the experimental technique, these measurements are generated from single-point data, line measurements, line-of-sight absorption, or multidimensional imaging at acquisition rates ranging from single-shot to high-repetition rate measurements to resolve turbulent dynamics [6]. This data is commonly processed in the form of statistical results from Favre and Reynolds averaging, conditional data, probability density functions, and scatter data. These data provide important information for model evaluation.

Significant progress has been made in simulating turbulent flows. This progress can largely be attributed to adopting high-fidelity large-eddy simulation (LES) for the prediction of unsteady turbulent flows, and establishing forums for collaborative comparisons of benchmark flames that are supported by comprehensive databases [7].

In spite of the increasing popularity of LES, the validation of combustion models follows previous steady-state RANS-approaches, comparing statistical moments (typically mean and root-mean-square) and conditional data along axial and radial locations in the flame. Qualitative comparisons of scatter data are commonly employed to examine whether a particular combustion model is able to represent certain combustion-physical processes in composition space that, for instance, are associated with extinction, equilibrium composition, or mixing conditions. In other investigations, error measures were constructed by weighting moments and scalar quantities for evaluating the sensitivity of model coefficients in subgrid models and for uncertainty quantification [8, 9]. It is not uncommon that models match particular measurement quantities (such as temperature or major species products) in certain regions of the flame, while mispredicting the same quantities in other regions or showing disagreements for other flow-field data at the same location. Further, the comparison of individual scalar quantities makes it difficult to consider dependencies and identify correlations between flow-field quantities. Faced with this dilemma, such comparisons often only provide an inconclusive assessment of the model performance, and limit a quantitative comparison among different modeling approaches. Therefore, a need arises to develop a metric that enables a quantitative assessment of combustion models, fulfilling the following requirements:

Provide a single metric for quantitative model evaluation; 2. 2.

Combine single and multiple scalar quantities in the validation metric, including temperature, mixture fraction, species mass fractions, and velocity; 3. 3.

Incorporate scatter data from single-shot, high-speed, and simultaneous measurements, and enable the utilization of statistical data; 4. 4.

Enable the consideration of dependencies between measurement quantities; and 5. 5.

Ensure that metric satisfies conditions on non-negativity, identity, symmetry, and triangular inequality.

By addressing the requirements, the objective of this work is to introduce the probabilistic Wasserstein distance [10] as a metric for the quantitative evaluation of combustion models. Complementing currently employed comparison techniques, this metric possesses the following attractive properties. First, this metric directly utilizes the abundance of data from unsteady simulation techniques and high repetition rate measurement methods. Rather than considering lower-order statistical moments, this metric is formulated in distribution space. As such, it is thereby directly applicable to scatter data that are obtained from transient simulations and high-speed measurements without the need for data reduction to low-order statistical moment information. Second, this metric is able to synthesize multidimensional data into a scalar-valued quantity, thereby aggregating model discrepancies for individual quantities. Third, the resulting metric utilizes a normalization, thereby enabling the objective comparison of different simulation approaches. Fourth, this method is directly applicable to sample data that are generated from scatter plots, instantaneous simulation results or reconstructed from statistical results, and enables the consideration of conditional and multiscalar data. Fifth, the Wasserstein metric is equipped with essential properties of metric spaces. Finally, this metric is a companion tool to previously established methods for validation of LES and instantaneous CFD-simulations.

The remainder of this manuscript is structured as follows. Section 2 formally introduces the Wasserstein metric and builds a mathematical foundation for this method. This method is demonstrated in application to the Sydney piloted jet flame with inhomogeneous inlet, and experimental configuration and simulation setup are described in Section 3. Results are presented in Section 4. To introduce the metric, we first consider a scalar distribution of mixture fraction and establish a quantitative comparison of experiments and simulations against presumed closure models. This is then extended to multiscalar data, involving temperature and species mass fractions. Subsequently, we examine the utility of applying the Wasserstein metric to statistical results, rather than scatter data, while Section 5 explores how the Wasserstein metric could add value for directly comparing multiple simulations to assess LES closure models. This work finishes in Section 6 with conclusions.

2 Methodology

In this section, the methodology of quantifying the dissimilarity between two multivariate distributions (joint PDFs) is discussed. Since multivariate distributions are commonly used to represent the thermo-chemical states in turbulent flows, this method is useful for the quantitative assessment of differences between a numerical simulation and experimental data of turbulent flames.

The dissimilarity between two points in the thermo-chemical state space can typically be quantified by its Euclidean distance. In the case of a 1D state space, e.g. temperature, the Euclidean distance is simply the absolute value of the difference. This metric serves the purpose well for the comparison among two deterministic measurements, which can indeed be fully represented by points in the state space. However, the Euclidean distance falls short when measurements are made on random data, e.g. states in a turbulent flow, in which case a single data point can no longer represent the measurement and shall be replaced by a distribution of possible outcomes. Consequently, the dissimilarity between two random variables shall be quantified by the “distance” between the corresponding probability distributions.

Among many definitions of the “distance” between distributions [11, 12], the Wasserstein metric (also known as the Mallows metric) is of interest in this study. Many other methods of quantifying the difference between two distributions are either designed for equality test (e.g. Kolmogorov-Smirnov test) or fails to satisfy the metric properties (e.g. Kullback-Leibler divergence). Interested readers are referred to the survey by Gibbs and Su [11] for the detailed comparison between the Wasserstein metric and other candidates. The Wasserstein metric represents a natural extension of the Euclidean distance, which can be recovered by the Wasserstein metric as the distribution reduces to a Dirac delta function. As such, this metric provides a measure of the difference between distributions that resembles the classical idea of “distance”. The resulting value can be judged and interpreted in the similar fashion as the arithmetic difference between two deterministic quantities.

The Wasserstein metric has been used as a tool for measuring the “distance” between distributions or histograms in the context of content-based image retrieval [12], hand-gesture recognition [13], and analysis of 3D surfaces [14], etc. These applications were first proposed by Rubner et al. [12], under the name of the Earth Mover’s Distance (EMD), which is in fact the Wasserstein metric of order one, $W_{1}$ . More recent applications favor the $2^{\text{nd}}$ Wasserstein metric by way of Brenier’s theorem [14, 15], which shows the uniqueness of the optimal solution for the squared-distance and its connection to differential geometry through which more efficient algorithms have been constructed.

This section is primarily concerned with the Wasserstein metric for discrete distributions in the Euclidean space. This allows us to focus on the concepts that are directly related to the practical calculation and estimation of this metric, and avoid the usage of measure theory without creating much ambiguity. The definition of the Wasserstein metric for general probability measures with more formal treatment of the probability theory is discussed in Appendix A. The interested reader is also referred to books by Villani [16], surveys by Urbas [17], and the more recent lectures by McCann and Guillen [18] for further details.

2.1 Preliminaries: Monge-Kantorovich transportation problem

The Wasserstein metric is motivated by the classical optimal transportation problem first proposed by Monge in 1781 [19]. The optimal transportation problem of Monge considers the most efficient transportation of ore from $n$ mines to $n$ factories, each of which produces and consumes one unit of ore respectively. These mines and factories form two finite point sets in the Euclidean space, which are denoted by ${\cal{M}}$ and ${\cal{F}}$ . The cost of transporting one unit of ore from mine $x_{i}\in{\cal{M}}$ to factory $y_{j}\in{\cal{F}}$ is denoted by $c_{ij}$ , which is chosen by Monge to be the Euclidean distance. The transport plan can be expressed in the form of an $n\times n$ matrix $\Gamma$ , of which the element $\gamma_{ij}$ represent the amount of ore transported from $x_{i}$ to $y_{j}$ . The central problem in optimal transportation is to find the transport plan that minimizes the total cost of transportation, which is the sum of the costs on $n\times n$ available routes between factories and mines, i.e., $\sum_{i=1}^{n}\sum_{j=1}^{n}\gamma_{ij}c_{ij}$ . The original problem formulated by Monge requires transporting the ore in its entirety. Therefore, $\Gamma$ is constrained to be a permutation matrix, and can only take binary values of 0 or 1. Formally, the Monge transport problem can be expressed as

[TABLE]

The transport problem in Monge’s formulation can be solved relatively efficiently by the Hungarian algorithm proposed by Kuhn in 1955 [20].

In 1942, Kantorovich reformulated Monge’s problem by relaxing the requirement that all the ore from a given mine goes to a single factory [21, 22]. As a result, the transport plan is modified from being a permutation matrix to a doubly stochastic matrix. With this, Monge’s problem is written in the following form:

[TABLE]

This relaxation eliminates the difficulties of Monge’s formulation in terms of obtaining certain desirable mathematical properties, such as existence and uniqueness of the optimal transport plan. The Monge-Kantorovich transportation problem in Eq. 2 can be solved as the solution to a general linear programming problem, although more efficient algorithms exists by exploiting special structures of the problem.

2.2 Wasserstein metric for discrete distributions

The Monge-Kantorovich problem in Eq. 2 not only gives the solution to the optimal transport problem, but also provides a way of quantifying the dissimilarity between the distributions of the mines and the factories by evaluating the optimal transport cost (normalized by total mass). The Wasserstein metric [23] follows directly from this observation.

Consider replacing the mines and factories by two general discrete distributions, whose probability mass functions are:

[TABLE]

where $\sum_{i=1}^{n}f_{i}=\sum_{j=1}^{n^{\prime}}g_{i}=1$ and $\delta(\cdot)$ denotes the Kronecker delta function:

[TABLE]

The unit cost of transport between $x_{i}$ and $y_{j}$ is defined to be the $p^{\text{th}}$ power of the Euclidean distance, i.e. $c_{ij}=|x_{i}-y_{j}|^{p}$ . In addition, the “mass” of probabilities $f_{i}$ and $g_{j}$ is no longer restricted to be unity and the possible outcomes of the distributions, i.e. $x_{i}$ and $y_{j}$ , are points in an Euclidean space whose dimension is not restricted.

The $p^{\text{th}}$ Wasserstein metric between discrete distributions $f$ and $g$ is defined to be the $p^{\text{th}}$ root of the optimal transport cost for the corresponding Monge-Kantorovich problem:

[TABLE]

The constraints in Eq. 5 ensure that the total mass transported from $x_{i}$ and the total mass transported to $y_{j}$ match $f_{i}$ and $g_{j}$ , respectively.

The Wasserstein metric for continuous distributions is presented in Appendix A. The estimation and calculation for the continuous case, for higher-dimensional problems, are discussed in the next section.

2.3 Non-parametric estimation of $W_{p}$ through empirical distributions

Two major difficulties arise in obtaining the Wasserstein metric of two multivariate distributions of thermo-chemical states. One is that the multivariate distributions are not readily available from either experiments or simulations. Instead, a series of samples drawn from these distributions is provided. The other is that there is no easy way of calculating $W_{p}$ for continuous multivariate distributions, especially for those of high dimensions. To overcome these problems, a non-parametric estimation of $W_{p}$ is devised using empirical distributions.

An empirical distribution is a random discrete distribution formed by a sequence of independent samples drawn from a given distribution of interest. Let $(\mathcal{X}_{1},\,\ldots,\,\mathcal{X}_{n})$ be a set of $n$ independent random samples obtained from a continuous multivariate distribution $f$ . The empirical (or fine-grained [24]) distribution ${\widehat{f}}$ is defined to be

[TABLE]

which is a discrete distribution with equal weights. The empirical distribution is random as it depends on random samples, $\mathcal{X}_{i}$ . The samples may be obtained from experimental measurements or generated from a given distribution using, for instance, an acceptance-rejection method [25]. Calculation of $W_{p}$ between two empirical distributions is identical to the method discussed in Sec. 2.2. Its procedure and cost is independent of the dimensionality of the distributions.

The empirical distribution converges to the original distribution. Most importantly, in the context of this study, is the convergence in the Wasserstein metric [26]. More specifically, the Wasserstein metric between empirical and original distributions converges to zero in probability. Given the metric property of $W_{p}$ :

[TABLE]

such convergence ensures that the Wasserstein metric between two empirical distributions also converges in probability to that of the actual distribution. In addition, the mean rates of convergence for empirical distributions have also been established [27, 28], giving the upper bound on the expectation, $E\big{(}W_{p}^{p}({\widehat{f}}_{n},\,f)\big{)},$ as $n$ increases. For the similar reason, the convergence rate for the non-parametric estimation follows. The exact rates depend on the dimensionality and regularity conditions of the distributions. For details on the convergence rate for more general cases, the interested reader is referred to the work by Fournier and Guillin [28].

In the case of $d$ -dimensional distributions that have sufficiently many moments, we have:

[TABLE]

where the value of $C$ depends on the distribution $f$ and is independent of $n$ . By combining this result with Eq. 7, we obtain the following convergence rate for the non-parametric estimation of $W_{2}$ :

[TABLE]

where $n_{*}=\min(n,\,n^{\prime})$ . With this, all necessary results that ensure the soundness of using $W_{p}({\widehat{f}}_{n},{\widehat{g}}_{n^{\prime}})$ as an estimator for $W_{p}(f,g)$ are presented.

In addition to the rate of convergence , statistical inference and further quantification of uncertainty for the non-parametric estimation of $W_{p}$ can also be performed. In particular, the magnitude of the uncertainty in Eq. 9 and the corresponding confidence interval can be estimated via the method of bootstrap [29, 30]. In particular the m-out-n bootstrap [31] is shown to be consistent for the Wasserstein metric [32, 33].

2.4 Statistically most likely reconstruction of distributions

So far, the evaluation of the Wasserstein metric using sample data as empirical distribution function has been described. However, the usage of the Wasserstein metric is not limited to results reported in such fashion. In the following, the computation of the Wasserstein metric from statistical results will be discussed. This versatility is important to the metric, given the fact that the conventional practice of reporting only the statistics, predominantly first and second moments, is still the prevailing one.

The procedure of applying the Wasserstein metric to statistical results is to first reconstruct the multivariate distribution from statistical results. An empirical distribution is then sampled to compute the Wasserstein metric following the method described in Sec. 2.5. The reconstruction of a continuous PDF from a set of known statistical models can be performed using the concept of the statistically most likely distribution (SMLD) [34, 35, 36]. The SMLD of a $d$ -dimensional random variable is defined to be the distribution that maximizes the relative entropy, given a prior distribution $g(\mathbf{x})$ , under a given set of constraints:

[TABLE]

Here, $\mathbf{T}(\mathbf{x})$ is the set of statistical moments that are selected as constraints, and $\overline{\mathbf{t}}$ is the vector of corresponding values obtained from the data. The type of the so obtained distribution is dictated by the form of the constraints [37]. For instance, if the multivariate distribution is constructed using only the first and second moments with a uniform prior, a multivariate normal distribution is obtained. In addition, if only the marginal second moments are given while the cross moments are not, the obtained multivariate normal distributions are uncorrelated. More specifically, if

[TABLE]

the SMLD becomes

[TABLE]

where $\bm{\mu}_{k}=\overline{\mathbf{x}}_{k}$ and $\bm{\sigma}_{k}^{2}=\overline{\mathbf{x}^{2}_{k}}-\overline{\mathbf{x}_{k}}^{2}$ .

After obtaining the SMLDs, the sets of samples can be drawn from them and the Wasserstein metric can be directly calculated. In addition, the metric property implies the following inequality

[TABLE]

In other words, the difference in the Wasserstein metrics of the actual distributions, $f$ and $g$ , and that of the reconstructed counterparts, $f_{\text{SML}}$ and $g_{\text{SML}}$ , are bounded by the error of the SMLD reconstruction, which themselves can be quantified by the Wasserstein metrics. In the current study, the set of constraints are limited to the marginal first and second moments, which are typically reported in the literature. More accurate reconstruction can be obtained by including higher-order and cross moments. In addition, statistics of other forms may also be considered. For instance, $\beta$ -distributions can be recovered for constraints in the form of $\overline{\ln(x)}$ and $\overline{\ln(1-x)}$ , which is potentially more appropriate for conservative scalars such as mixture fraction.

2.5 Calculation procedure

In the present study, the $2^{\text{nd}}$ Wasserstein metric is used. The computation of the metric can be realized by any general-purpose linear programming tool. For the present analysis, the program by Pele and Werman [38] is used. It calculates the Wasserstein metric as a flow-min-cost problem (a special case of linear programming) using the successive shortest path algorithm [39]. A pseudocode of the corresponding algorithm is given in Alg. 1, and Fig. 1 provides an illustrative example for the evaluation of the Wasserstein metric. The source code for the evaluation of the Wasserstein metric is provided in Sec. B.

In this context, it is important to note that the input data to the Wasserstein metric are normalized to enable a direct comparison and enable a physical interpretation of the results. A natural choice is to normalize each sample-space variable by its respective standard deviation that is computed from the reference data set (for instance, the experimental measurements).

Suppose we have two sets of data with sample sizes of $n$ and $n^{\prime}$ , respectively. Each sample represents a point in the thermo-chemical (sub)-space, e.g. $\mathbf{x}=[Z,T,Y_{\ce{H2O}},\ldots]$ . The empirical distribution can be constructed from each data set following Eq. 6, where $f_{i}=1/{n}$ and $g_{i}=1/{n^{\prime}}$ . The Wasserstein metric is then computed following the definition in Eq. 5. The worst time complexity of the algorithm is $\mathcal{O}\left((n+n^{\prime})^{3}\log(n+n^{\prime})\right)$ . Note that the dimension of the thermo-chemical (sub)-space affects only the pair-wise distance between data points but not the definition or calculation of the metric.

2.6 Remarks on the interpretation and usage of the Wasserstein metric $W_{p}$

The Wasserstein metric is a natural extension of the Euclidean distance to statistical distributions. It enables the comparison between two multi-dimensional distributions via a single metric while taking all information presented by the distributions into consideration. The definition of $W_{p}$ in Eq. 5 can be viewed as the weighted average of the pair-wise distances between samples of the two distributions. In the case of one-dimensional distributions, the obtained value of the metric shares the same unit as the sample data. For instance, if two distributions of temperature are considered, the corresponding $W_{p}$ in unit of kelvin can be interpreted as the average difference between the values of temperature from the two distributions. In the case of multi-dimensional distributions, each dimension is normalized before pair-wise distances are calculated. The choice of the normalization method is application-specific. In this study, the marginal standard deviation is chosen. The so obtained $W_{p}$ represents the averaged difference, proportional to the marginal standard deviations, between samples from the two distributions. As such, a value of $W_{p}=0.5$ can be interpreted as a difference of simulation and experimental data at the level of $0.5$ standard deviation. Although not considered in this study, additional turbulent-relevant information, such as space-time correlation, can be factored in via the extension of the phase space for the PDFs as performed by Muskulus and Verduyn-Lunel [40].

Comparable samples need to be drawn from numerical simulation and experimental data to ensure a consistent comparison between the two distributions. This can be achieved by matching the sampling locations and frequencies between the two sources of data. Furthermore, the experimental uncertainty may also be factored in by either the convolution of simulation data with error distributions or the Bayes deconvolution of the experimental data [41, 42].

3 Configuration

In the present study, the utility of the Wasserstein metric is evaluated in an application to simulation results of a piloted turbulent jet flame with inhomogeneous inlet. The experimental configuration is described in the next section, the computational setup is summarized in Sec. 3.2, and statistical simulation results that are utilized in evaluating Wasserstein metric are presented in Sec. 3.3.

3.1 Experimental setup

The piloted turbulent jet flame with inhomogeneous inlets considered in this work was experimentally investigated by Meares et al. [43, 44] and Barlow et al. [45]. The burner is schematically shown in Fig. 2. The reactants are supplied through an injector that consists of three tubes; the inner fuel-supply tube with a diameter of 4 mm is recessed by a length $L_{r}$ with respect to the burner exit plane. This inner tube is placed inside an outer tube (with a diameter of $D_{J}=7.5$ mm) of ambient air; depending on the recess height, varying levels of mixture inhomogeneity can be achieved. The flame is stabilized by a pilot stream, exiting through an annulus with outer diameter of 18 mm. The burner is placed inside a wind tunnel, providing co-flowing air at a bulk velocity of 15 m/s.

The present study considers the operating condition FJ-5GP-Lr75-57. The fuel of methane is supplied through the inner tube (FJ). The pilot flame is a gas mixture consisting of five components (5GP: $\ce{C_{2}H_{2}}$ , $\ce{H_{2}}$ , $\ce{CO_{2}}$ , $\ce{N_{2}}$ , and air to match the product gas composition and equilibrium temperature of stoichiometric \ceCH4/air mixture). The bulk velocity of the unburned pilot mixture is $3.72$ m/s. The inner tube of the fuel stream is recessed from the jet exit plane by $L_{r}=75\,\text{mm}$ , with the bulk jet velocity is set to 57 m/s (Lr75-57), corresponding to $50\%$ of the experimentally measured blow-off velocity. The recess results in a partially premixed reactant-gas mixture, which is relevant to modern gas turbine applications [46, 47].

3.2 Computational setup and mathematical model

The computational domain is discretized using a three-dimensional structured mesh in cylindrical coordinates, and includes the upstream portion of the burner to represent the partial mixing of reactants and flame stabilization. The computational domain extends to $20D_{J}$ in axial direction and $15D_{J}$ in radial direction, and is discretized using 1.6 million control volumes. Inflow conditions for the fuel/air jet, the pilot, and the coflowing air stream are obtained from separate simulations. The pilot flame is treated by prescribing the scalar profile from the corresponding chemistry table, with the mixture stoichiometry, temperature, and mass flow rate representing the experimental setting. An improved flame stability is experimentally observed with the inhomogeneous inlets condition. This can be attributed to the upstream premixed combustion of the near-stoichiometric fluid in the jet reacting with the hot pilot. Local extinction and re-ignition was found to be not relevant under these operating conditions [44].

To model the turbulent reacting flow-field, a flamelet/progress variable (FPV) model is employed [48, 49], in which the thermo-chemical quantities are expressed in terms of a reaction-transport manifold that is constructed from the solution of steady-state non-premixed flamelet equations [50]. This model requires the solution of transport equations for the filtered mixture fraction, residual mixture fraction variance, and filtered reaction progress variable. These modeled equations take the following form:

[TABLE]

in which the turbulent fluxes are modeled by a gradient transport assumption [51], and the residual scalar dissipation rate ${\widetilde{\chi}}^{\rm{res}}_{Z}$ is evaluated using spectral arguments [52]. With the solution of Eqs. 14, all thermo-chemical quantities are then expressed in terms of ${\widetilde{Z}},{\widetilde{Z^{\prime\prime 2}}},$ and ${\widetilde{C}}$ , and a presumed PDF-closure is used to model the turbulence-chemistry interaction. For this, the marginal PDF for mixture fraction is described by a presumed $\beta$ -PDF, and the conditional PDF of the reaction progress variable is modeled as a Dirac-delta function.

A recently performed analysis of the model compliance showed that the FPV-approach only provides an incomplete description of the interaction between the partially premixed mixture and the hot pilot and discrepancies in the prediction of carbon monoxide [53]. Therefore, the present simulation is intended for the purpose of demonstrating the merit of employing the Wasserstein metric as a quantitative validation measure and to identify discrepancies of the model through direct comparisons against experiments. Extended flamelet models have been developed to describe the complex flame topology and turbulence-chemistry interaction appearing in this configuration [54, 55, 56, 57], and the performance of other models will be examined in Sec. 5.

3.3 Statistical results and scatter data

Before we examine the Wasserstein metric, this section summarizes statistical results between simulations and experiments. The simulated data presented in this section is not expected to replicate the experimental results perfectly. Instead, the data serves the purpose to determine where in the flame and for which species the model behaves well, as well as regions in which the model could be improved. Quantitative results using the Wasserstein validation metric will be presented in Sec. 4, and these results should match the data interpretation developed in this section.

Comparisons between radial profiles of experimental data and simulations are provided in Fig. 3. The comparisons are made at four distinct axial locations ( $x/D_{J}=\{1,5,10,15\}$ ), with four scalar quantities, which include mixture fraction ( $Z$ ), temperature ( $T$ ), and species mass fractions of \ceCO2 and \ceCO. The solid lines represent simulation results, while the symbols correspond to data collected from experimental data. In the following, these four scalars are used as quantities of interest for the Wasserstein metric to embody the accuracy of the simulations in modeling mixing, heat release, fuel conversion, and emissions. Radial profiles for mixture fraction and temperature are in overall good agreement with measurements. Discrepancies are largely confined to regions in the jet core and shear-layer, where simulations underpredict scalar mixing. A shift in the peak location for temperature and \ceCO2 mass fraction profiles at the intermediate axial locations, $x/D_{J}=\{5,10\}$ is apparent, which can be related to discrepancies in the mixing profile. Results for mean CO-profiles are presented in the last row of Fig. 3, showing that the simulation underpredicts this intermediate product, which can be attributed to shortcomings of the FPV-combustion model [53].

Scatter data and mixture-fraction conditioned data from experiments and simulations are shown in Fig. 4. This data is sampled at a subset of the axial locations, while using $Z$ , $T$ , and mass fractions of \ceCO2 and \ceCO as the same four quantities of interest. Scatter data are frequently examined to assess the agreement of the thermo-chemical state space that is accessed by the model and experiment. While this direct comparison provides insight about shifts in the composition profiles, as seen for mass fractions of CO and \ceCO2, such comparisons are mostly of qualitative nature. By utilizing the Wasserstein metric, these scatter data and statistical results will be used in the next section to obtain an objective measure for the quantitative assessment of the agreement between measurements and simulations.

4 Results for application of Wasserstein metric

To introduce the Wasserstein metric as a quantitative validation tool, in the following we consider two test cases. The first test case, presented in Sec. 4.1, focuses on the analysis of a single-scalar experimental results in which mixture fraction data is considered at individual points in the flame. Previous work has shown that the mixture fraction can reasonably be approximated by a $\beta$ -distribution [58], and this test case examines this premise by applying the Wasserstein metric to experimental data and modeled $\beta$ -distribution that is obtained from a maximum likelihood estimation (MLE) of the measurements. This one-dimensional test case is intended to present the capabilities of the Wasserstein metric in a simplified context.

The second test, presented in Sec. 4.2, considers the quantitative validation of LES modeling results against experimental data. For this, the Wasserstein metric will be employed to incorporate multiple thermo-chemical quantities, including $Z$ , $T$ , $Y_{\ce{CO2}}$ , and $Y_{\ce{CO}}$ , thereby contracting information about the model accuracy for predicting mixing, fuel conversion, and emissions into a single validation measure. Several locations in the flame will be considered to evaluate potential model deficiencies, demonstrating the merit of the Wasserstein metric as multidimensional validation tool.

In these test cases, scatter data and empirical distributions that are reconstructed from statistical moment information, using the method presented in Sec. 2.4, are considered to examine the accuracy of both methods.

4.1 Conserved scalar results

Previous investigations have shown that the evolution of conserved scalars in two-stream systems can be approximated by a $\beta$ -distribution [58], and a Dirichlet-distribution as a multivariate generalization of the $\beta$ -distribution provides a description of turbulent scalar mixing in multistream flows [54]. This section examines the accuracy of representing the conserved scalar by a presumed PDF using the Wasserstein metric as a quantitative metric. To simplify the analysis, this study focuses on data collected at the axial locations introduced previously ( $x/D_{J}=\{1,5,10,15\}$ ) and a set of four radial positions located at $r/D_{J}=\{0,0.5,0.85,1.2\}$ . The axial positions are spaced uniformly, whereas the radial positions represent key locations in the burner geometry. Specifically, these four radial locations correspond approximately to the center of the fuel stream, the outer edge of the air stream, the midpoint of the pilot, and the outer edge of the pilot, respectively. Although these sixteen measurement locations were chosen for the analysis, it is possible to apply the Wasserstein metric at any location in the flame and generate similar results.

PDFs for mixture fraction, represented by the histograms in Fig. 5, provide the experimental distribution for all of the points of interest. Superimposed over each plot is a maximum likelihood estimation of this data using the $\beta$ -distribution [59], whose probability distribution function reads

[TABLE]

where $\Gamma(\cdot)$ is the gamma function. Suppose that there are $n$ independent samples drawn from a $\beta$ -distribution, whose values are $x_{1},\,x_{2},\,\ldots,\,x_{n}$ , the method of maximum likelihood estimates the parameter $a$ and $b$ by finding the arguments of the maxima for the logarithmic likelihood function,

[TABLE]

Although the best-fit $\beta$ -distribution provides a reasonable approximation for the data, there still are noticeable differences between the experimental data and the MLE-fit. These differences are quantitatively expressed through the Wasserstein metric, and the computed values are reported at the top of each histogram in Fig. 5.

The quantitative evaluation of the Wasserstein metric shows that largest deviations between experimental and presumed distributions occur within the fuel jet near the nozzle exit. These quantitative results are corroborated with an interpretation of the histograms. The largest deviations between the MLE $\beta$ -distribution and measurements occur near the nozzle inlet, corresponding to regions of high turbulence and strong mixing. Note that the magnitude of the Wasserstein metric, $W_{2}(Z)$ , is provided in natural units of mixture fraction, which is bounded to the interval $[0,1]$ . As such, the metric provides an integral physical interpretation of the differences between both distributions.

In the next step, we employ the Wasserstein metric to evaluate the accuracy of the presumed $\beta$ -distribution along several radial profiles. For this, a total of 86 uniformly spaced points are considered along the four axial measurement stations used previously. At each point, a $\beta$ -distribution is constructed from the measurements using the maximum likelihood estimate, from which calculations of the Wasserstein metric are performed subsequently. Results are presented in Fig. 6, and they show that the agreement of the $\beta$ -distribution with the experimentally determined PDFs improves with increasing axial and radial distance. This observation corroborates the findings from the point-wise analysis in Fig. 5, and agrees with physical expectation that with increasing downstream distance the mixture composition approaches a homogeneous state. The low absolute error values of $W_{2}<0.02$ compared to the [0,1] range of mixture fraction shows that, overall, the $\beta$ -distribution provides an adequate representation of the experimentally determined mixture fraction data.

While this test case affirms that the mixture fraction PDF follows a $\beta$ -distribution, it also demonstrates three key traits of the Wasserstein metric. First, the quantitative nature of the Wasserstein metric allowed for direct comparisons between several distributions, simultaneously. For example, it is much easier to compare the four distributions at $x/D_{J}=1$ using the Wasserstein metric results, as opposed to analyzing their differences in Fig. 5 directly. The Wasserstein metric also provides a comparison in distribution space. It therefore contains information about all moments, and is not limited to low-order moments such as mean and root-mean square quantities. Finally, the Wasserstein metric is applicable to any location in the flow, thereby providing fine-grained information about the simulation accuracy, model deficiencies in predicting certain scalar quantities, and the impact of inconsistencies of boundary conditions on the simulation. These three fundamental features will be emphasized and built upon as Sec. 4.2 considers the multidimensional validation of a LES combustion model using experimental data.

4.2 Multiscalar results

Having provided a comparison for the Wasserstein metric applied to mixture fraction as a single scalar, this second test case will demonstrate the application of the Wasserstein metric to multiple scalars in the form of joint scalar distributions. This property allows for multiple, simultaneous error calculations, which provide a multifaceted, quantitative validation. Here, the Wasserstein metric is used to compare simulation results with experimental data for the piloted jet flame with inhomogeneous inlets as discussed in Sec. 3.

To quantitatively assess a combustion simulation, a multiscalar Wasserstein metric can be evaluated that takes into account $d$ scalar quantities. In the present cases, four scalar quantities are considered, namely mixture fraction, temperature, and species mass fractions of \ceCO_2 and \ceCO, and evaluations are performed at different axial locations in the jet flame.

The $W_{2}$ -metric at $x/D_{J}=10$ for three two-scalar cases: $Z$ - $T$ , $Z$ - $Y_{\ce{CO_{2}}}$ , and $Z$ - $Y_{\ce{CO}}$ are shown in Fig. 7. The $W_{2}$ -metric of similar magnitude and radial profile are found in the cases of $Z$ - $T$ and $Z$ - $Y_{\ce{CO_{2}}}$ , in which there is little discrepancy between the results obtained from the scatter data and SMLD reconstruction. The $Z$ - $Y_{\ce{CO}}$ case exhibits a much higher level of $W_{2}$ . There is also greater difference between the values from two sources of experimental results. We then examine how the $W_{2}$ -metric is affected by increasing the number of scalars that is included in its evaluation. For this, we consider results at the same axial location of $x/D_{J}=10$ , and results are presented in Fig. 8. The result of $Z$ - $Y_{\ce{CO_{2}}}$ is repeated for clarity. Since the Wasserstein metric measures differences in distribution, it allows for a direct evaluation of how differences arising from uncertainties in boundary conditions or modeling errors manifest in the flow field. The values for the Wasserstein metric increases as more quantities are considered (from left to right in Fig. 8). This is to be expected as the pair-wise distances becomes larger with the inclusion of additional dimensions.

A comparison of the Wasserstein metric, evaluated from the SMLD-reconstruction, shows that this statistical reconstruction technique qualitatively and quantitatively captures the results obtained from the scatter data. However, the main deviation made by the SMLD-reconstruction arise when including $Y_{\ce{CO}}$ , which indicates deficiencies of the uncorrelated normal distribution in representing the actual joint distribution including $\ce{CO}$ and possible strong correlations between $\ce{CO}$ and other quantities.

Having built the multiscalar Wasserstein metric, results for calculations at different axial locations in the flame are displayed in Fig. 9. The figure on the left represents a $W_{2}$ -comparison made using scatter data, while the figure on the right represents a comparison made using SMLD-reconstructed data. Although scatter data provides a more representative conclusion, SMLD data, reconstructed from mean and variance data, may be more practical based on the existing data reported from simulation results. Since the data has been normalized by the standard deviation, as explained in Sec. 2.5, the error metric is unitless. This property, along with the stacking of error contributions, allows for simple comparisons between species, as well as comparisons between axial locations. The stacking represents error contributions that are computed based on each scalar’s contribution in pair-wise distances and the transport matrix $\Gamma$ obtained from the four-scalar Wasserstein metric. It can be seen that main contributions to the deviations arise from the mixture fraction at the jet core, and approximately equal distributions from temperature and CO mass fractions in the shear layer. This information can be used to guide corrections in the boundary conditions and the selection of the combustion model. Since the transport matrices $\Gamma$ are different between cases of different dimensionalities, no direct correspondence can be established between the stacking in Fig. 9 and the lower dimensional results in Fig. 8. Note also that by increasing the number of scalar quantities in the evaluation of the Wasserstein metric, correlations and scalar interdependencies are taken into account.

Figure 9 reveals the sources and locations of model error. For example, considering $x/D_{J}=10$ , most of the error is concentrated in the region between $0.5\leq r/D_{J}\leq 1$ . The largest contributors to this error are $Y_{\ce{CO}}$ and $T$ . There is additional, but less significant, error in the central jet region from $Z$ , which can be attributed to deficiencies in the boundary conditions. Finally, the model error drops off from all sources on the edge of the domain. Similar analyses could be conducted for the other axial locations, but this example demonstrates the usefulness of the Wasserstein metric as a validation tool. Information from these calculations could be used to identify model limitations, isolate regions where mesh refinement is needed, and where further measurements are required.

Three further advantages of using the Wasserstein metric in combustion validation are outlined below. One of the method’s benefits is that it isolates sensitivity to boundary conditions. The Wasserstein metric calculations for $x/D_{J}=1$ provide some indication of the error at the boundary conditions. As results downstream are examined, it can be seen that the error features from the inlet are convecting and diminishing. One example of this behavior is the mixture fraction error in the jet region, which is a significant source of error for $x/D_{J}=1$ , but a minor source of error downstream at $x/D_{J}=15$ . By comparing slices at the inlet and slices downstream, one can identify how boundary conditions introduce errors to the combustion model. Although some error features are diffusing, others are forming downstream.

A second benefit to this approach is the detection of modeling errors not arising from boundary conditions. These new peaks arise from deficiencies in the combustion model. For instance, the error for $Y_{\ce{CO}}$ , just inside $r/D_{J}=1$ is modest at the inlet, peaks sharply at $x/D_{J}=5$ , and reduces at $x/D_{J}=10$ , before peaking sharply again at $x/D_{J}=15$ . This error represents a deficiency in the CO modeling, and it could be targeted for improvement in future versions of the combustion model. In this way, the Wasserstein metric helps identify regions of modeling error and highlights potential areas for model improvement.

Finally, a third benefit to the Wasserstein metric validation approach is the seamless transition to multiscalar quantities and the contraction of multidimensional information into a single scalar value. Any number of additional species could be added to these plots, without a significant increase in computational cost. The resulting higher-dimensional calculations would offer an even more detailed comparison of the simulated and experimental data, and provide more insight into boundary condition and combustion modeling error. Condensing multiple validation plots into a compact metric makes the validation process easier to interpret, and provides greater understanding about the effectiveness of combustion models.

5 Quantitative comparison of different simulation results

In this section, we seek to demonstrate the capability of the Wasserstein metric as a tool for quantitative comparisons of simulation results that are generated using different modeling strategies. The direct application of this metric has the potential for eliminating the need for subjective model assessments. It can also provide a direct evaluation of strengths and weaknesses of certain modeling approaches and guides the need for further experimental data to constraint modeling approaches.

In this study, we resort to experimental and computational data of the piloted turbulent burner with inhomogeneous inlets that were collectively reported at the 13th International Workshop on Measurement and Computation of Turbulent Nonpremixed Flames (TNF) [60]. By complying with the TNF-spirit of open scientific collaborations and acknowledging that results reported at this venue are a work-in-progress, we removed any reference to the original authorship in the following representation. In this context, we would like to emphasize that the present investigation is not intended to provide any judgment about a particular modeling strategy. Such an endeavor requires a concerted community effort, by which the herein proposed Wasserstein metric could come to use as a quantitative measure. It is noted that several of these TNF-contributions have been extended since, and we would like to refer to publications [61, 62, 53, 63, 64, 65, 66], which provide further details on the validation and analysis of simulation results, as well as description of combustion models and computational setups of individual groups.

In this study, we concentrate on the flame configuration FJ-5GP-Lr75-57 that was discussed in Sec. 3.1 and was selected as a target configuration at the TNF-workshop. The following analysis concentrates on comparisons of scatter plots of mixture fraction, temperature, and species mass fractions of \ceCO2 and \ceCO at four axial locations $x/D_{J}=\{1,5,10,15\}$ .

In order to apply the Wasserstein metric to this extensive set of modeling results, we proceeded as follows. First, we downsampled the scatter data by randomly selecting 5000 points, each point containing information for $Z,T,Y_{\ce{CO2}},$ and $Y_{\ce{CO}}$ . This downsampling procedure was performed to achieve reasonable runtimes. The chosen number of samples was found to be appropriate for representing the simulation results without loss of information, and a $5\%$ level of uncertainty is estimated for the $W_{p}$ of multiscalar cases. Next, each data set was normalized by the experimental standard deviation, as set forth in Sec. 2.5. Subsequently, the multiscalar Wasserstein metric was evaluated from each data set, and results are presented in Fig. 10.

From this direct comparison, the following observations can be drawn. First, the multiscalar Wasserstein metric, $W_{2}(Z,T,Y_{\ce{CO2}},Y_{\ce{CO}})$ , shows cumulative contributions of each scalar quantity to the overall error that is invoked by the model prediction. As discussed in Sec. 2.2, the analysis is performed using properly non-dimensionalized quantities, so that the $W_{2}$ -metric provides a direct representation of the error; a numerical value of $W_{2}=1$ corresponds to an error of one standard deviation with respect to the measurement. From this follows the second observation that this decomposition of the $W_{2}$ -metric provides direct information about relative contributions of each quantity that is included in its construction. Since the Wasserstein metric is constructed from joint scatter data, it takes into account correlations among scalar quantities, and can therefore be employed in isolating causalities in model predictions. For instance, it can be seen that simulations from Institutions 1 and 2 exhibit a relatively small error contribution from mixture fraction and temperature, but a larger contribution from CO mass fraction. In contrast, Institutions 3 and 4 show a bias towards larger errors in the mixture fraction data, whereas Institutions 9 and 10 show an approximately equally distributed contribution from all four quantities to the $W_{2}$ -metric. Such quantitative information can be useful in isolating model deficiencies and guiding the formulation of model extensions.

A third observation can be drawn by considering the axial evolution of the $W_{2}$ -metric. Specifically, from Fig. 10 it can be seen that most simulations show a reduction in the error with increasing axial distance. This trend can be explained by the equilibration of the combustion, reduction of the mixing, and decay of the turbulence, which is typical for canonical jet flames. In turn, significant deviations from this anticipated trend can hint at potential deficiencies in the model setup, or physically interesting combustion behavior (such as local extinction, multistream mixing, flame lifting, or local vortex break-down) that requires further experimental investigation or refinement of the modeling procedure, improving mesh-resolution, or adjustments of boundary conditions. As example, the simulation from Institution 5 in Fig. 10 shows significant deviation of the mixture fraction at the second measurement location ( $x/D_{J}=5$ ), suggesting that results at this measurement location require further analysis.

The fourth observation arises from the utility of the Wasserstein metric in separating contributions from boundary conditions and combustion-model formulations. For instance, dominant discrepancies at the first measurement location, $x/D_{J}=1$ , can most likely be attributed to uncertainties in the boundary conditions. Contribution of different combustion models can then be tested to examine model developments, and guide model selections for simulating particular flame configurations. Simulation results reported from this comparative study show a rather wide variation in model accuracy, ranging between $W_{2}=0.8$ for the most accurate simulation to values of $W_{2}=3$ , with largest errors occurring at the first reported measurement station.

These results show that the application of the Wasserstein metric to a large set of simulation data provides for a quantitative assessment of different simulation results. The practical utility of this metric lies in the direct assessment of the model performance and in tracking the model convergence as continuous improvements on the model formulation, boundary conditions, and measurements are provided. In regard to providing a verbal evaluation of the accuracy of a particular model, the Wasserstein metric can be utilized as practically useful model performance index for model ranking [67]. The Wasserstein metric provides a way of ranking models, which we believe is an initial step towards advancing progress in combustion modeling.

6 Conclusions

This manuscript addresses the need for the validation of numerical simulations by considering multiple scalar quantities (velocity, temperature, composition), different data presentations (statistical results, scatter data, and conditional results), and various data acquisitions (pointwise, 1D, and planar measurements at different spatial locations). To this end, the Wasserstein metric is proposed as a general formulation to facilitate quantitative and objective comparisons between measurements and simulations. This metric is a probabilistic measure and is applicable to empirical distributions that are generated from scatter data or statistical results using probabilistic reconstruction.

The Wasserstein metric was derived for turbulent-combustion applications and essential convergence properties where examined. The resulting method was demonstrated in application to LES of a partially premixed turbulent jet flame, and used to categorize errors arising from deficiencies in the specification of boundary conditions and intrinsic limitations of combustion models. This investigation was followed by applying the Wasserstein metric to different simulations that were contributed to the TNF-workshop in order to demonstrate the versatility of this method in establishing an objective evaluation of large data sets.

In building upon previous statistical validation approaches, the Wasserstein metric offers greater insight by condensing multiscalar differences between data into a single error measure. The Wasserstein metric gives an accurate indication of errors over the whole flame, and its interpretation can assist in identifying causalities of discrepancies in numerical simulation, arising from boundary conditions, complex flow-field structures, or model limitations. The method is easy to apply, and its major benefit lies in the seamless extension to multiscalar analyses, thereby taking into account interdependencies of combustion-physical processes. When applied to combustion validation, the Wasserstein metric can concisely communicate shortcomings of current models, and assist in the development of more accurate LES combustion models.

Acknowledgments

Financial support through NASA’s Transformational Tools and Technologies Program with Award No. NNX15AV04A is gratefully acknowledged. Resources supporting this work were provided by the NASA High-End Computing (HEC) Program through the NASA Advanced Supercomputing (NAS) Division at Ames Research Center and the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. We would like to thank Benoit Fiorina, Mélody Cailler, and contributors to the 2016 TNF Workshop for permitting us to use the computational data presented in Fig. 10 for analysis. To acknowledge that these results were a work-in-progress and honor TNF-policy, we anonymized individual contributions.

Appendix A Wasserstein metric for general probability measures

Let $(M,\,d)$ denote a complete separable metric space equipped with distance functions $d$ , on which there are two probability measures $\mu$ and $\nu$ with finite $p^{\text{th}}$ moments. Following Kantorovich’s formulation, we have the $p^{\text{th}}$ Wasserstein metric between $\mu$ and $\nu$ defined as

[TABLE]

where $\Gamma(\mu,\,\nu)$ is a set of admissible measures $\gamma$ on $M\times M$ , whose marginals are $\mu$ and $\nu$ .

Specifically for continuous distributions defined on the Euclidean space with $M=\mathbf{R}^{d}$ and $d(x,\,y)=|x-y|$ , the measures $\mu$ and $\nu$ can be represented by their probability density functions denoted by $f$ and $g$ . Thus, we can rewrite the definition in Eq. 17 as

[TABLE]

where $G(f,\,g)$ is a set of joint probability density functions, whose marginal density functions satisfy

[TABLE]

The formulation in Eq. 18 of the Wasserstein metric entails two different interpretations. The first interpretation is very close to the origin of the optimal transportation problem. If each distribution is viewed as a pile of “dirt” distributed in the Euclidean space according to the probability, the metric is the minimum amount of work required to turn one pile into the other. The transport plan $h(x,\,y)$ represents the density of mass to be transported from $x$ to $y$ . The second interpretation views $h(x,\,y)$ as the joint distribution of $x$ and $y$ whose marginal distributions match $f$ and $g$ . The Wasserstein metric is the minimal expectation of the distance between $x$ and $y$ among all such distributions.

For the special case of one-dimensional distributions on the real line, the Wasserstein metric possesses many useful properties [68, 69]. Let $F$ and $G$ be the cumulative distribution functions for one-dimensional distributions $\mu$ and $\nu$ , while $F^{-1}$ and $G^{-1}$ being their corresponding inversions. The Wasserstein metric can then be written in explicit form as

[TABLE]

Furthermore, when $\mu$ and $\nu$ are marginal empirical distributions with the same number of samples, the relationship in Eq. 21 can be further simplified as

[TABLE]

where $x^{*}_{i}$ and $y^{*}_{i}$ are $x_{i}$ and $y_{i}$ in sorted order.

Appendix B Sample code for the evaluation of the multidimensional Wasserstein metric

Sample code is provided for three different test-cases, involving the evaluation of the Wasserstein metric for a single scalar quantity ( $Z$ ), joint scalars ( $Z$ - $T$ ), and multi scalars ( $Z$ - $T$ - $Y_{\ce{CO2}}$ - $Y_{\ce{CO}}$ ). The code, available at: https://github.com/IhmeGroup/WassersteinMetricSample, is easily adaptable to other conditions. The implementation of the sample code mirrors the procedure laid out in Section 2.5. The script sampleWasserstein.m is the main program, while calcW2.m is the function that calculates the Wasserstein metric.

Inputs to calcW2.m include experimental and simulated data samples for the Sydney piloted jet flame, at $x/D_{J}=10$ , $r/D_{J}=0.6$ , as well as a list of comparison species. Outputs include the calculated Wasserstein metric ( $W_{2}$ ), transport matrix, and data visualizations (for one-dimensional and two-dimensional cases, only). Additional cases can be calculated by modifying the species_index input, where the available species include $Z$ , $T$ , $Y_{\ce{CO_{2}}}$ , and $Y_{\ce{CO}}$ .

Bibliography69

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. V. Heitor and A. L. N. Moreira. Thermoacouples and sample probes for combustion studies. Prog. Energy Combust. Sci. , 19:259–278, 1993.
2[2] A. C. Eckbreth. Laser Diagnostics for Combustion Temperature and Species . Gordon and Breach, 1996.
3[3] K. Kohse-Höinghaus, R. S. Barlow, M. Aldén, and J. Wolfrum. Combustion at the focus: laser diagnostics and control. Proc. Combust. Inst. , 30(1):89–123, 2005.
4[4] R. S. Barlow. Laser diagnostics and their interplay with computations to understand turbulent combustion. Proc. Combust. Inst. , 31:49–75, 2007.
5[5] R. S. Barlow, J. H. Frank, A. N. Karpetis, and J.-Y. Chen. Piloted methane/air jet flames: Transport effects and aspects of scalar structure. Combust. Flame , 143:433–449, 2005.
6[6] M. Aldén, J. Bood, Z. Li, and M. Richter. Visualization and understanding of combustion processes using spatially and temporally resolved laser diagnostic techniques. Proc. Combust. Inst. , 33(1):69–97, 2011.
7[7] R. S. Barlow, 1996. Web site for the International Workshop on Measurement and Computation of Turbulent Nonpremixed Flames (TNF), http://www.ca.sandia.gov/TNF/abstract.html .
8[8] A. M. Kempf, B. J. Geurts, and J. C. Oefelein. Error analysis of large-eddy simulation of the turbulent non-premixed Sydney bluff-body flame. Combust. Flame , 158(12):2408–2419, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A General Probabilistic Approach for Quantitative Assessment of LES Combustion Models

Abstract

keywords:

Contents

1 Introduction

2 Methodology

2.1 Preliminaries: Monge-Kantorovich transportation problem

2.2 Wasserstein metric for discrete distributions

2.3 Non-parametric estimation of WpW_{p}Wp​ through empirical distributions

2.4 Statistically most likely reconstruction of distributions

2.5 Calculation procedure

2.6 Remarks on the interpretation and usage of the Wasserstein metric WpW_{p}Wp​

3 Configuration

3.1 Experimental setup

3.2 Computational setup and mathematical model

3.3 Statistical results and scatter data

4 Results for application of Wasserstein metric

4.1 Conserved scalar results

4.2 Multiscalar results

5 Quantitative comparison of different simulation results

6 Conclusions

Acknowledgments

Appendix A Wasserstein metric for general probability measures

Appendix B Sample code for the evaluation of the multidimensional Wasserstein metric

2.3 Non-parametric estimation of $W_{p}$ through empirical distributions

2.6 Remarks on the interpretation and usage of the Wasserstein metric $W_{p}$