Embedded Ridge Approximations

Chun Yui Wong; Pranay Seshadri; Geoffrey Parks; Mark Girolami

arXiv:1907.07037·math.NA·August 19, 2020

Embedded Ridge Approximations

Chun Yui Wong, Pranay Seshadri, Geoffrey Parks, Mark Girolami

PDF

TL;DR

This paper introduces embedded ridge approximations for scalar fields in differential-equation models, enabling efficient estimation of quantities of interest by exploiting local influence and anisotropy, with theoretical and simulation validation.

Contribution

It formalizes embedded ridge approximations for scalar fields, connecting them with active subspaces, and develops algorithms for reduced order modeling based on ridge profiles at nodes.

Findings

01

Embedded ridge approximations effectively estimate global quantities of interest.

02

Algorithms recover field quantities at nodes using limited ridge profile data.

03

The approach demonstrates favorable theoretical properties and practical performance.

Abstract

Many quantities of interest (qois) arising from differential-equation-centric models can be resolved into functions of scalar fields. Examples of such qois include the lift over an airfoil or the displacement of a loaded structure; examples of corresponding fields are the static pressure field in a computational fluid dynamics solution, and the strain field in the finite element elasticity analysis. These scalar fields are evaluated at each node within a discretised computational domain. In certain scenarios, the field at a certain node is only weakly influenced by far-field perturbations; it is likely to be strongly governed by local perturbations, which in turn can be caused by uncertainties in the geometry. One can interpret this as a strong anisotropy of the field with respect to uncertainties in prescribed inputs. We exploit this notion of localised scalar-field influence for…

Equations120

f (x) = g (W^{T} x) .

f (x) = g (W^{T} x) .

f (x) \approx g (W^{T} x),

f (x) \approx g (W^{T} x),

C (f) = \int_{D} \nabla f (x) \nabla f (x)^{T} ρ (x) d x = E [\nabla f (x) \nabla f (x)^{T}],

C (f) = \int_{D} \nabla f (x) \nabla f (x)^{T} ρ (x) d x = E [\nabla f (x) \nabla f (x)^{T}],

g (W^{T} x) = E [f (x) ∣ W^{T} x] .

g (W^{T} x) = E [f (x) ∣ W^{T} x] .

h (x) = \int_{D} ω (s) f (x, s) d s,

h (x) = \int_{D} ω (s) f (x, s) d s,

\nabla_{x} h (x) = \int_{D} ω (s) \nabla_{x} f (x, s) d s .

\nabla_{x} h (x) = \int_{D} ω (s) \nabla_{x} f (x, s) d s .

f (x, s) \approx g_{s} (W_{s}^{T} x),

f (x, s) \approx g_{s} (W_{s}^{T} x),

\nabla_{x} f (x, s) \approx \frac{\partial g _{s} ( W _{s}^{T} x )}{\partial x} = W_{s} \nabla g_{s} (W_{s}^{T} x),

\nabla_{x} f (x, s) \approx \frac{\partial g _{s} ( W _{s}^{T} x )}{\partial x} = W_{s} \nabla g_{s} (W_{s}^{T} x),

\nabla_{x} h (x) \approx \int_{D} ω (s) W_{s} \nabla g_{s} (W_{s}^{T} x) d s .

\nabla_{x} h (x) \approx \int_{D} ω (s) W_{s} \nabla g_{s} (W_{s}^{T} x) d s .

\nabla_{x} h (x) \approx i = 1 \sum N ω_{i} W_{s_{i}} \nabla g_{s_{i}} (W_{s_{i}}^{T} x), = i = 1 \sum N ω_{i} W_{i} \nabla g_{i} (W_{i}^{T} x),

\nabla_{x} h (x) \approx i = 1 \sum N ω_{i} W_{s_{i}} \nabla g_{s_{i}} (W_{s_{i}}^{T} x), = i = 1 \sum N ω_{i} W_{i} \nabla g_{i} (W_{i}^{T} x),

C (h) := E [\nabla_{x} h (x) \nabla_{x} h (x)^{T}] \approx E (i = 1 \sum N ω_{i} W_{i} \nabla g_{i} (W_{i}^{T} x)) (j = 1 \sum N ω_{j} W_{j} \nabla g_{j} (W_{j}^{T} x))^{T} = i = 1 \sum N j = 1 \sum N ω_{i} ω_{j} W_{i} E [\nabla g_{i} \nabla g_{j}^{T}] W_{j}^{T} .

C (h) := E [\nabla_{x} h (x) \nabla_{x} h (x)^{T}] \approx E (i = 1 \sum N ω_{i} W_{i} \nabla g_{i} (W_{i}^{T} x)) (j = 1 \sum N ω_{j} W_{j} \nabla g_{j} (W_{j}^{T} x))^{T} = i = 1 \sum N j = 1 \sum N ω_{i} ω_{j} W_{i} E [\nabla g_{i} \nabla g_{j}^{T}] W_{j}^{T} .

f (x) = f_{1} (x) ⋮ f_{N} (x) = g_{1} (W_{1}^{T} x) ⋮ g_{N} (W_{N}^{T} x),

f (x) = f_{1} (x) ⋮ f_{N} (x) = g_{1} (W_{1}^{T} x) ⋮ g_{N} (W_{N}^{T} x),

H (f) = E [J (x) R J (x)^{T}] = \int_{D} J (x) R J (x)^{T} ρ (x) d x,

H (f) = E [J (x) R J (x)^{T}] = \int_{D} J (x) R J (x)^{T} ρ (x) d x,

J (x) = [\frac{\partial f _{1}}{\partial x}, \dots, \frac{\partial f _{N}}{\partial x}],

J (x) = [\frac{\partial f _{1}}{\partial x}, \dots, \frac{\partial f _{N}}{\partial x}],

R = ω ω^{T}, where ω = (ω_{1}, ω_{2}, \dots, ω_{N})^{T},

R = ω ω^{T}, where ω = (ω_{1}, ω_{2}, \dots, ω_{N})^{T},

h (x) = \int_{D} ω (s) f (x, s) d s \approx ω^{T} f (x) .

h (x) = \int_{D} ω (s) f (x, s) d s \approx ω^{T} f (x) .

{(W_{i}^{T} x^{(1)}, f_{i}^{(1)}), \dots, (W_{i}^{T} x^{(M)}, f_{i}^{(M)})}

{(W_{i}^{T} x^{(1)}, f_{i}^{(1)}), \dots, (W_{i}^{T} x^{(M)}, f_{i}^{(M)})}

C (h) = \frac{1}{M} m = 1 \sum M J (x^{(m)}) R J (x^{(m)})^{T}

C (h) = \frac{1}{M} m = 1 \sum M J (x^{(m)}) R J (x^{(m)})^{T}

R = ω ω^{T} .

R = ω ω^{T} .

{(U^{T} x^{(1)}, ω^{T} f^{(1)}), \dots, (U^{T} x^{(M)}, ω^{T} f^{(M)})}

{(U^{T} x^{(1)}, ω^{T} f^{(1)}), \dots, (U^{T} x^{(M)}, ω^{T} f^{(M)})}

M \geq \frac{2 L ^{2} lo g ( 2 r )}{ϵ ^{2} E [ \nabla g _{i} \nabla g _{j}^{T} ] _{2}},

M \geq \frac{2 L ^{2} lo g ( 2 r )}{ϵ ^{2} E [ \nabla g _{i} \nabla g _{j}^{T} ] _{2}},

W_{i} - W_{i}_{2} \leq η_{M} .

W_{i} - W_{i}_{2} \leq η_{M} .

E C (h) - C (h)_{2} \leq L^{2} C (2 η_{M} + ϵ + ϵ^{2}),

E C (h) - C (h)_{2} \leq L^{2} C (2 η_{M} + ϵ + ϵ^{2}),

d i s t (S_{1}, S_{2}) = W_{1} W_{1}^{T} - W_{2} W_{2}^{T}_{2},

d i s t (S_{1}, S_{2}) = W_{1} W_{1}^{T} - W_{2} W_{2}^{T}_{2},

g (W^{T} x) = g (W^{T} x) + ((W - W)^{T} x)^{T} \nabla_{u} g (W^{T} x) \nabla g (u) ∣_{u = W^{T} x} + h.o.t.

g (W^{T} x) = g (W^{T} x) + ((W - W)^{T} x)^{T} \nabla_{u} g (W^{T} x) \nabla g (u) ∣_{u = W^{T} x} + h.o.t.

E [(g (W^{T} x) - g (W^{T} x))^{2}] \approx ϵ = E [(x^{T} (W - W) \nabla_{u} g (W^{T} x))^{2}] .

E [(g (W^{T} x) - g (W^{T} x))^{2}] \approx ϵ = E [(x^{T} (W - W) \nabla_{u} g (W^{T} x))^{2}] .

ϵ \leq G^{2} σ_{x}^{2} i = 1 \sum r (2 - 2 cos (θ_{r})),

ϵ \leq G^{2} σ_{x}^{2} i = 1 \sum r (2 - 2 cos (θ_{r})),

h (x) = [235] f_{1} (x) f_{2} (x) f_{3} (x)

h (x) = [235] f_{1} (x) f_{2} (x) f_{3} (x)

f_{1} (x)

f_{1} (x)

f_{2} (x)

f_{3} (x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Embedded Ridge Approximations111© 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Chun Yui Wong222PhD student, Department of Engineering, University of Cambridge.

[email protected]

Pranay Seshadri333Research Fellow, Department of Mathematics (Statistics Section), Imperial College London.

Geoffrey T. Parks444Reader, Department of Engineering, University of Cambridge.

Mark Girolami 555Sir Kirby Laing Chair of Civil Engineering, Department of Engineering, University of Cambridge, and Strategic Director, The Alan Turing Institute.

Abstract

Many quantities of interest (qois) arising from differential-equation-centric models can be resolved into functions of scalar fields. Examples of such qois include the lift over an airfoil or the displacement of a loaded structure; examples of corresponding fields are the static pressure field in a computational fluid dynamics solution, and the strain field in the finite element elasticity analysis. These scalar fields are evaluated at each node within a discretised computational domain. In certain scenarios, the field at a certain node is only weakly influenced by far-field perturbations; it is likely to be strongly governed by local perturbations, which in turn can be caused by uncertainties in the geometry. One can interpret this as a strong anisotropy of the field with respect to uncertainties in prescribed inputs. We exploit this notion of localised scalar-field influence for approximating global qois, which often are integrals of certain field quantities. We formalise our ideas by assigning ridge approximations for the field at select nodes. This embedded ridge approximation has favorable theoretical properties for approximating a global qoi in terms of the reduced number of computational evaluations required. Parallels are drawn between our proposed approach, active subspaces and vector-valued dimension reduction. Additionally, we study the ridge directions of adjacent nodes and devise algorithms that can recover field quantities at selected nodes, when storing the ridge profiles at a subset of nodes—paving the way for novel reduced order modeling strategies. Our paper offers analytical and simulation-based examples that expose different facets of embedded ridge approximations.

keywords:

Ridge approximation, vector-valued functions, dimension reduction , active subspaces , minimum average variance estimation

1 Introduction

The governing physics in many engineering problems is described by a system of partial differential equations (PDEs). These equations can be solved by suitable discretisation methods such as finite element and finite volume methods, where scalar fields—e.g. pressures, temperatures, strains—are computed at each node over the PDE domain. One can interpret these scalar fields as vector-valued functions, conditioned upon certain boundary conditions and geometry parameters. Here each value of the output vector corresponds to the scalar field quantity at a specific node of the domain. Integrals of these scalar field variables typically constitute output qois in uncertainty quantification studies. For instance, when propagating uncertainties in the Mach number and angle of attack of flow over an airfoil, one is interested in quantifying the moments of the lift and drag coefficients [1]; both lift and drag coefficients are surface integrals of the static pressure field around the airfoil [2, Ch. 1]. In studying the impact of uncertainties in leakage flows in a compressor [3], one is interested in quantifying the moments of isentropic efficiency, which can be expressed as an integral of the pressure and temperature ratios [4, p. 7]. Qois that are integrals of such scalar fields are prevalent well beyond computational fluid dynamics (CFD). For instance, the displacement of a structure is the integral of the strain field [5, sec. 2.1], which allows us to analyse linear elastic displacement problems in areas including soil mechanics [6, sec. 6.4] and machine component design [7, sec. 5.2] using the strain field.

Uncertainty quantification studies typically require a design of experiment, where the governing PDE model is evaluated under different inputs. The number of times the model is evaluated depends on the dimension of the input space and the non-linearity of the scalar qois. In this paper, we explore a deviation from this paradigm. Rather than storing scalar qois for each model evaluation, we explore whether one can reduce the number of model evaluations if one stores select scalar fields. More specifically, we want to design emulators for the scalar fields themselves and then integrate the emulators to obtain the desired qois.

So, why should we focus on scalar fields? Consider the following examples. Over an airfoil in subsonic flow, the static pressure of a point near the leading edge is unlikely to be strongly affected by small geometric perturbations far downstream. In a similar vein, the local deflection of a structure is unlikely to be affected by small local changes in elastic properties far away from the point of measurement. We refer to this property of physical scalar fields as localisation—the output at each node is only influenced by perturbations local to the node. Mathematically, a parameterised scalar field can be expressed as a function $f(\mathbf{x},\mathbf{s})$ , where $\mathbf{x}$ denotes the perturbation parameters and $\mathbf{s}$ the spatial location. We say that the field is localised if for each $\mathbf{s}$ , the scalar field depends mainly on a smaller subset of $\mathbf{x}$ localised within a region666Localisation here refers to the fact that the physical perturbations influencing each node are contained spatially. This is in contrast with physical features that are localised spatially, such as shock waves and phase boundaries. The latter refer to a localised set of output nodes, but the dependence of these nodal values on the input need not be localised.. Provided that this smaller set of variables can be found, the curse of dimensionality can be abated and the number of experiments required for approximating the field can be greatly reduced. A related scalar field property is smoothness, which implies a certain degree of continuity in the variation of the field and its derivatives. Smoothness can be defined with respect to the parameters $\mathbf{x}$ and/or the spatial location $\mathbf{s}$ . In this paper, we assume smoothness with respect to $\mathbf{x}$ , such that smooth functions can be used to form approximation models. Furthermore, we show that smoothness in $\mathbf{s}$ can be leveraged to form compressed representations of scalar fields, facilitating the reconstruction of scalar fields from approximation.

To construct low-dimensional approximations of fields, we draw ideas from ridge functions [8]. A function $f:\mathbb{R}^{d}\rightarrow\mathbb{R}$ whose variation is entirely contained within a subspace described by ran( $\bm{W}$ ), where $\bm{W}\in\mathbb{R}^{d\times r}$ has orthonormal columns and $\text{ran}(\cdot)$ denotes the column space, is called a generalised ridge function [8]. That is, it can be expressed as

[TABLE]

In this paper, we will refer to these functions simply as ridge functions for brevity. Many physical qois are characterised by anisotropy in the input domain—i.e. they vary strongly only within a subspace—such as the localised output components described in the previous paragraph. These qois can be well-approximated by ridge functions, and the process of finding these ridge functions is known as ridge approximation. Namely, we find $\bm{W}\in\mathbb{R}^{d\times r}$ with orthonormal columns and $g:\mathbb{R}^{r}\rightarrow\mathbb{R}$ such that

[TABLE]

where the approximation can be formulated by minimising the mean squared error (MSE) over the input domain [9, sec. 3.1]. When given a ridge approximation, significant computational run-time savings can be achieved by working with $g\left(\bm{W}^{T}\mathbf{x}\right)$ as an emulator for $f(\mathbf{x})$ if $r\ll d$ , where we have effectively reduced the dimension of our problem from $d$ to $r$ . We refer to the span of the columns of $\bm{W}$ as ridge directions, and $g$ as the ridge profile. As $\bm{W}$ takes the $d$ -dimensional input to an $r$ -dimensional projection, we also call the column space of $\bm{W}$ the dimension-reducing subspace.

Numerous methods for ridge approximation have been proposed in the literature. Central to this paper are strategies based on analysis of the average outer product of the gradient with itself, which we will call the gradient covariance matrix of $f$ . Assuming that $f$ is Lipschitz continuous with bounded first derivatives, this matrix is defined as

[TABLE]

where $\mathcal{D}$ is the input domain, and $\rho:\mathbb{R}^{d}\rightarrow\mathbb{R}$ is a function that integrates to unity and is strictly positive within $\mathcal{D}$ . We assume that all entries in this matrix are finite. Interpreting $\rho$ as a probability density function (PDF) over the input domain, we can replace the integral by the expectation over $\mathbf{x}$ , which we treat as a random variable777In the following, unless specified otherwise, all expectations are computed with respect to $\rho(\mathbf{x})$ .. Samarov [10] and Constantine et al. [11, 9] analyse the eigendecomposition of this matrix and show that, if the variation of the function outside the span of eigenvectors corresponding to large eigenvalues is small, then a suitable choice for the ridge directions is the leading eigenvectors of the gradient covariance matrix. The subspace spanned by these leading eigenvectors is termed the active subspace of $f$ by Constantine et al. [9]. In a recent paper by Zahm and co-authors [12], an extension to vector-valued functions is proposed. In Section 2 we will explore the properties of their vector gradient covariance matrix.

In practice, gradients of the qoi are often difficult to evaluate; for example, computational models may be based on legacy codes without automatic differentiation capabilities. In the absence of gradient information, other methods of ridge approximation have been proposed. Fornasier et al. [13] and Tyagi and Cevher [14] describe methods to recover low-dimensional ridge structures with finite differencing through compressed sensing and low-rank recovery, respectively. Hokanson and Constantine [15] describe an algorithm to form a polynomial ridge approximation from fewer data than required for a full response surface by solving a non-linear least squares problem with variable projection (VP) [16]. Constantine et al. [17] and Eftekhari et al. [18] describe methods to estimate the ridge directions based on finite differences. In addition, Glaws et al. [19] draw an analogy between ridge recovery and sufficient dimension reduction (SDR). In SDR, the goal is to find the minimal subspace, described by the column span of a matrix $\bm{W}$ , with which we can establish conditional independence between a set of covariates $\mathbf{x}$ and the response $y$ in a regression setting. That is, we seek a subspace described by $\bm{W}$ such that $y\perp\mathbf{x}$ given $\bm{W}^{T}\mathbf{x}$ . This central subspace is intimately linked to the ridge directions [19, thm. 2], which implies that regression-based methods within the study of SDR can be applied to ridge approximation (see Section 2.2 in [20] for further discussion). Examples of such methods include Sliced Inverse Regression (SIR) [21], Sliced Average Variance Estimation (SAVE) [22] and Minimum Average Variance Estimation (MAVE) [23]. The former two techniques are based on inverse regression and the latter based on forward regression; given a set of predictor/response pairs $\{\mathbf{x}_{i},y_{i}\}_{i=1}^{M}$ , forward regression aims to estimate statistics of the distribution of the response given covariates ( $y|\mathbf{x}$ ), while inverse regression aims to characterise $\mathbf{x}|y$ .

Given the ridge directions $\bm{W}$ , the ridge profile $g$ that minimises the mean squared approximation error $\mathbb{E}\left[\left(f(\mathbf{x})-\widetilde{g}\left(\bm{W}^{T}\mathbf{x}\right)\right)^{2}\right]$ over $\widetilde{g}$ can be shown analytically to be [9, 12]

[TABLE]

Several approaches for approximating this average have been proposed, including the use of Gaussian processes [20] and multivariate orthogonal polynomials [24].

In this paper, we build upon these ideas and introduce an embedded ridge approximation to connect the ridge approximation of desired qois with ridge approximations of their constituent scalar fields. Loosely stated, an embedded ridge approximation is based on vector-valued functions, where each component can be approximated by a ridge function—computed for scalar fields at select nodes within the computational domain. Leveraging gradient-free techniques, a surrogate model for the scalar field is constructed via ridge approximations at each node. When provided with certain structural assumptions about the field—including localisation—we show that it is possible to reduce the number of model evaluations for computing the dimension-reducing subspace of a related qoi, by exploiting these nodal ridge approximations.

The rest of the paper is structured as follows: in Section 2, we describe the concept of an embedded ridge function, and provide an algorithm supported by theoretical analysis that leverages this embedded structure to find the dimension-reducing subspace of a scalar qoi based on an underlying spatial field. In Section 3, we leverage the similarity between neighbouring output components to motivate a method of efficiently storing the array of ridge directions associated with each output component, and provide some algorithms for this process. In Section 4, we use analytical and numerical examples to illustrate the algorithms proposed in this paper.

Notation. We denote the approximation of a quantity with a hat. For instance, a finite-sample estimate of a matrix $\bm{H}$ is denoted as $\widehat{\bm{H}}$ and a response surface for $g$ fitted with finitely many samples is denoted $\widehat{g}$ . A general perturbation of a quantity is denoted with a tilde. For instance, a perturbation of $\mathbf{w}$ is denoted $\widetilde{\mathbf{w}}$ .

2 Embedded ridge approximation

Consider the scalar field $f(\mathbf{x},\mathbf{s})$ where $\mathbf{x}\in\mathcal{D}\subset\mathbb{R}^{d}$ parameterises our model of interest and $\mathbf{s}\in\mathbb{R}^{K}$ is a variable that denotes the spatial location in $K$ -dimensional space. We place the following assumptions on this field:

the input space $\mathcal{D}$ is endowed with a probability density $\rho(\mathbf{x})$ ; 2. 2.

the field is square integrable with respect to the probability density $\rho(\mathbf{x})$ and Lipschitz continuous with bounded and square integrable first partial derivatives with respect to $\mathbf{x}$ for all $\mathbf{s}$ .

An example of $f(\mathbf{x},\mathbf{s})$ is the pressure within a computational domain—characterised by geometry parameters or boundary conditions $\mathbf{x}$ and a probe location $\mathbf{s}$ . Our goal is to study dimension-reducing subspaces induced by the function

[TABLE]

where $\omega:\mathbb{R}^{K}\rightarrow\mathbb{R}$ is a weight function. In other words, $h(\mathbf{x})$ represents a weighted average of the scalar field $f(\mathbf{x},\mathbf{s})$ , and is the relevant qoi. If $f(\mathbf{x},\mathbf{s})$ is the pressure distribution, then one can think of $h(\mathbf{x})$ as the lift coefficient or drag coefficient, for instance. Assuming that the partial derivative of the integrand is bounded independent of $\mathbf{x}$ and $\mathbf{s}$ , we can write

[TABLE]

Now, let us assume that $f(\mathbf{x},\mathbf{s})$ can be approximated by a ridge function of the form

[TABLE]

where $\bm{W}_{\mathbf{s}}\in\mathbb{R}^{d\times r_{\mathbf{s}}}$ , where $r_{\mathbf{s}}$ identifies the dimension of the subspace spanned by the orthonormal columns of $\bm{W}_{\mathbf{s}}$ . Note that $g_{\mathbf{s}}$ , $\bm{W}_{\mathbf{s}}$ and $r_{\mathbf{s}}$ will, in general, depend on $\mathbf{s}$ . The gradient of $f$ is then given by

[TABLE]

noting that $\nabla g_{\mathbf{s}}\left(\bm{W}_{\mathbf{s}}^{T}\mathbf{x}\right):=\text{d}g_{\mathbf{s}}(\mathbf{u})/\text{d}\mathbf{u}|_{\mathbf{u}=\bm{W}_{\mathbf{s}}^{T}\mathbf{x}}$ . Thus,

[TABLE]

In practice, we can approximate this gradient with an $N$ -point quadrature rule $\left\{{\mathbf{s}}_{i},\omega_{i}\right\}_{i=1}^{N}$ , with quadrature points ${\mathbf{s}}_{i}$ and quadrature weights $\omega_{i}$ , from which we arrive at the approximation

[TABLE]

where we have expressed $g_{{\mathbf{s}}_{i}}$ as $g_{i}$ and $\bm{W}_{{\mathbf{s}}_{i}}$ as $\bm{W}_{i}$ for notational convenience. Then, from (3) we can approximate the scalar covariance matrix of $h(\mathbf{x})$ as

[TABLE]

The fact that (11) is expressed in terms of the ridge parameters $\bm{W}_{i}$ and $g_{i}$ is noteworthy. Given $\bm{C}(h)$ , we can find the dimension-reducing subspace of $h$ simply via an eigendecomposition, and this equation informs us that it is possible to shift the computational burden from evaluating the gradients $\nabla_{\mathbf{x}}h$ to the estimation of the ridge parameters $\bm{W}_{i}$ and $g_{i}$ . In the absence of automatic differentiation or adjoint solvers, the former may require finite differences or the use of a surrogate model, whose cost of formulation suffers from the curse of dimensionality. Supposing that the scalar field $f(\mathbf{x},{\mathbf{s}})$ is localised (see Section 1), at a fixed location ${\mathbf{s}}_{i}$ the dependence of $f(\mathbf{x},{\mathbf{s}}_{i})$ on $\mathbf{x}$ is likely to be highly anisotropic, depending mainly on the parameters that impact nodes adjacent to ${\mathbf{s}}_{i}$ . This implies that the number of ridge directions at each location is likely to be small, and $g_{i}$ will be a low-dimensional function. For many methods of ridge approximation, this implies that the amount of simulation data required for this step is reduced for a given approximation accuracy. This brings us to the central idea of this paper: instead of directly calculating the dimension-reducing subspace of $h$ , we leverage the decomposition of $h$ into its composite scalar field, whose dimension-reducing subspaces are likely to be inexpensive to compute. Using evaluations of the field, gradient-free strategies for ridge approximations—such as VP and MAVE, mentioned in Section 1—can be used to furnish ridge approximations at each node of the scalar field. Then, from these component subspaces, we assemble the scalar gradient covariance of $h$ , where the required gradients are calculated using approximations formed at the nodes via (10).

2.1 Interpretation as vector-valued dimension reduction

We assume that there exists a set of quadrature points ${\mathbf{s}}_{i}$ (domain nodes) to evaluate the scalar field $f(\mathbf{x},{\mathbf{s}})$ that allows us to formulate the ridge approximation (7) easily. In practice, this set of points is fixed by the computational domain and its associated mesh. We argue that reducing the dimensionality of the qoi $h$ is facilitated by the formulation (11). We can treat the scalar field evaluated at the prescribed positions as a vector-valued function $\mathbf{f}(\mathbf{x})$ , whose components $f_{i}(\mathbf{x}):=f(\mathbf{x},{\mathbf{s}}_{i})$ exhibit ridge approximations.

Definition 2.1 (Embedded ridge function).

Let $\mathbf{f}$ be a vector-valued function $\mathbf{f}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{N}$ with components $f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),...,f_{N}(\mathbf{x})$ , where each $f_{i}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ . Such a function $\mathbf{f}$ is called an embedded ridge function if it satisfies

[TABLE]

where the $i$ -th element of the vector, $f_{i}\left(\mathbf{x}\right)$ , is a ridge function of the form $g_{i}\left(\bm{W}_{i}^{T}\mathbf{x}\right)$ , for $i=1,\ldots,N$ . Here all subspace matrices $\bm{W}_{i}$ have the same number of rows $d$ , but may have different numbers of columns $r_{i}$ . It is assumed that the components of $\mathbf{x}$ are independent under the input measure $\rho$ .

An approximation of a vector-valued function using embedded ridge functions of the form (12) is called an embedded ridge approximation. There are parallels between an embedded ridge approximation and vector-valued dimension reduction. In [12], the authors introduce a vector gradient covariance matrix, analogous to the scalar form given in (3).

Definition 2.2 (Vector gradient covariance matrix).

We define the vector gradient covariance matrix $\bm{H}(\mathbf{f})\in\mathbb{R}^{d\times d}$ as

[TABLE]

where $\bm{R}\in\mathbb{R}^{N\times N}$ is a symmetric positive semi-definite matrix of weights, and

[TABLE]

where $\bm{J}\in\mathbb{R}^{d\times N}$ is the Jacobian matrix.

Observe that for an embedded ridge function, by setting

[TABLE]

we can recover the final line of (11). Thus, the embedded ridge approximation can be constructed by calculating the vector gradient covariance matrix of the discretised weighted scalar field. Note that, although this vector gradient covariance matrix coincides with the formulation in [12], our focus is on a weighted average of the underlying field instead of the vector-valued function in itself.

2.2 Algorithm for ridge computation

Equipped with a localised scalar field, we can design a procedure for the embedded ridge approximation of a qoi. Here our ridge function has the form

[TABLE]

In Section 2.2, we identify $N$ sets of ridge directions—one for every component of the vector $\mathbf{f}$ —assuming the absence of gradient information. We then fit ridge profiles $g_{i}$ to obtain nodal ridge approximations that we use for computing the scalar gradient covariance matrix for $h$ .

In the process of embedded ridge approximation, a ridge approximation for each node of the field is computed, exposing the rich structure endowed by the localisation of the field, thus enabling a reduction in the number of required training samples. These approximations can be used to form surrogate models of any qoi derived from the field, by replacing field evaluations by evaluations of the embedded ridges. However, we note that, in many situations, it is valuable to deduce a dimension-reducing subspace for the derived qoi, because the subspace facilitates many more tasks beyond surrogate modelling. Examples include optimisation [25, 26], visualisation [27] (if the subspace is one- or two-dimensional), sensitivity analysis [28] and discovery of physical insights [29]. The ability to visualise the variation of the qoi is especially important in the design process to easily gauge whether an approximation model can be trusted. Thus, the subspace computed from step 7 onwards in Section 2.2 plays an important role in the embedded ridge approximation approach.

A localised scalar field contains nodes that are well-approximated with low-dimensional ridge subspaces. These subspaces are usually of lower dimensionality than the ridge subspace of derived qois, implying that each $\bm{W}_{i}$ usually has fewer columns than $\bm{U}$ in Section 2.2. This gives the embedded ridge approximation approach an advantage—since low-dimensional ridge functions can be synthesised into a less easily found surrogate for qois. In what follows, we quantify this notion by studying the error on the gradient covariance matrix $\bm{C}(h)$ via the estimate $\widehat{\bm{C}}(h)$ computed in (18). We establish a bound on the expected norm difference $\mathbb{E}\left\lVert\bm{C}(h)-\widehat{\bm{C}}(h)\right\rVert_{2}$ with the matrix Bernstein inequality [30], where the matrix norm $\left\lVert\cdot\right\rVert_{2}$ yields the largest singular value of the argument. The accuracy of nodal ridge approximation is modelled with a quantity $\eta_{M}$ dependent on $M$ .999A better measure of this error can be defined using the more general subspace distance (22), which is basis-agnostic. However, it can be shown, in a similar manner to Lemma B.1, that a basis can be found such that a small subspace distance is equivalent to the present bound in this section. For ease in exposition, we also assume that the component ridge dimension $r$ is constant in space, so $\bm{W}_{i}\in\mathbb{R}^{d\times r}$ for all $i$ .

Theorem 2.1.

Assume that $\left\lVert\nabla g_{i}\right\rVert_{2}\leq L$ for all $1\leq i\leq N$ ,

[TABLE]

for all $1\leq i,j\leq N$ , and

[TABLE]

Then,

[TABLE]

where $C:=\sum_{ij}|\omega_{i}\omega_{j}|$ .

Proof.

See Appendix A. ∎

There are several important remarks to make regarding (21). It should be clear that the number of samples $M$ required scales as a function of $r$ instead of $d$ . This encapsulates the advantage brought about by considering the locality of the scalar field. In Section 4 we provide numerical studies that illustrate this. It is possible to derive a bound related to the subspace error of $\widehat{\bm{U}}$ in Section 2.2 as a corollary to Theorem 2.1. This involves steps very similar to Lemma 3.9 and Corollary 3.10 in [11], which are based on Corollary 8.1.11 in [31].

3 Efficient storage of embedded ridge approximations: Ridge compression

Scalar field quantities are propagated through a PDE domain node-by-node. It is therefore very likely that neighbouring nodes—depending on the overall resolution of the mesh—will have similar values of these quantities. More specifically, one can think of these quantities as being strongly correlated with their neighbours—i.e. the underlying scalar field is smooth spatially. When approximating each component $f_{i}$ as a ridge function, this correlation can be interpreted as similarity in both the ridge directions $\bm{W}_{i}$ and the ridge profile $g_{i}$ . In this section, we assume that the number of ridge directions for each $\bm{W}_{i}$ is constant, and equal to $r$ , for simplicity.

3.1 A perturbation bound on the mean squared error

Given a ridge function $g(\bm{W}^{T}\mathbf{x})$ , consider a perturbation of the dimension-reducing subspace from $\mathcal{S}=\text{ran}(\bm{W})$ to $\widetilde{\mathcal{S}}$ . This perturbation is quantified by the subspace distance, defined as

[TABLE]

for two subspaces $\mathcal{S}_{1}$ and $\mathcal{S}_{2}$ of $\mathbb{R}^{d}$ . Here, $\bm{W}_{1}$ and $\bm{W}_{2}$ are two matrices with orthonormal columns such that $\mathcal{S}_{1}=\text{ran}(\bm{W}_{1})$ and $\mathcal{S}_{2}=\text{ran}(\bm{W}_{2})$ respectively. The goal is to characterise the error incurred by the perturbation in the dimension-reducing subspace. For a basis matrix $\widetilde{\bm{W}}$ of $\widetilde{\mathcal{S}}$ , we can form a Taylor expansion

[TABLE]

If the subspace perturbation is small enough, higher-order terms (h.o.t.) can be neglected and the mean squared error can be approximated as

[TABLE]

Clearly, the quantity $\epsilon$ depends on the specification of basis matrices, which are not fixed for given subspaces. In the following theorem, we show that it is possible to select basis matrices that allow $\epsilon$ to be bounded by a function of the perturbation distance.

Theorem 3.1.

Let $\mathcal{S}=\text{ran}(\bm{W})$ , and $\widetilde{\mathcal{S}}$ be a perturbation of $\mathcal{S}$ . Assume that the square of the gradient is bounded as $\nabla_{\mathbf{u}}g^{T}\nabla_{\mathbf{u}}g\leq G^{2}$ and $\mathbb{E}[\mathbf{x}\mathbf{x}^{T}]=\sigma_{x}^{2}\bm{I}_{d}$ (i.e. inputs are independent and identically distributed). Then, if $dist(\mathcal{S},\widetilde{\mathcal{S}})\leq\sin(\theta_{r})$ , we can pick $\bm{W},\widetilde{\bm{W}}\in\mathbb{R}^{d\times r}$ where $\mathcal{S}=\text{ran}(\bm{W})$ and $\widetilde{\mathcal{S}}=\text{ran}(\widetilde{\bm{W}})$ such that

[TABLE]

where $\epsilon$ is the first-order approximation to the mean squared error (24).

Proof.

See Appendix B. ∎

Theorem 3.1 establishes a stability bound on the approximation error of a ridge function with a small perturbation of the associated subspace. Given the Lipschitz continuity of the underlying field with respect to the spatial domain, it is reasonable to assume that neighbouring nodes are determined by ridge directions that are closely related to each other. This motivates the proposal of algorithms to compress the representation of an embedded ridge function by approximating the ridge directions of some nodes as a function of their neighbours. After compression, only a fraction of the original ridge directions need to be stored.

In passing, we note that the actual approximation error incurred via compression can be smaller than suggested by Theorem 3.1, since the change in the ridge profile $g$ as a result of the perturbation in the subspace is not accounted for. In practice, after approximating the subspace by a perturbed version of its original value, the ridge profile can be refitted to data projected to the new subspace, minimising the MSE in the process. The new error can be smaller than simply applying $g$ to the data projected to the new subspace without changing the compression level.

3.2 Ridge compression and recovery

Given a priori knowledge of the relationship between neighbouring ridge directions, we can avoid storing the ridge directions for all output components. This is useful for re-creating the PDE-scalar field from the selected nodes. To this goal, we propose the ridge compression and recovery algorithms. The former allows us to retain only a subset of suitably subsampled output nodes (components of the vector $\mathbf{f}$ ); the latter recovers the remaining nodes from these subsamples.

Our algorithm for ridge compression is detailed in Section 3.2, where each removed component will be reconstructed by the average of two of its closest neighbours. Given an embedded ridge approximation and the number of components that need to be removed $k$ , we iterate through all the nodes $N$ and identify two neighbours for each node. These neighbours are identified based on the smallest subspace distance (see (22)) between successive nodes; see steps 6 and 7. In step 7, we require that the second closest neighbour must be closer to the candidate to be removed than the first neighbour in step 6. If this step is not enforced, the average between the neighbours is a poor approximation of the removed candidate. Following this, in step 10, we sort the candidates for removal by considering the sum of the distances of the removal candidates to their two neighbours. Removing a candidate with smaller total distance is prioritised over removing one with larger total distance. From step 11 onwards, we attempt to remove the candidates, according to the order determined in step 10. The recovery algorithm (Section 3.2) reconstructs the missing components based on the list of nearest neighbours (the output from Section 3.2). We only consider the case where $r=1$ here, permitting us to easily estimate the missing node’s ridge subspace as a linear combination of the neighbouring components.

In the ridge compression algorithm, once a node is marked as one of the neighbours of a removed node, it can no longer be removed. This sets a hard limit on how many nodes can be removed before all stored nodes are marked. One way to circumvent this difficulty is to apply the compression and recovery algorithms recursively. Figure 1 illustrates this idea. At each compression stage, up to $S$ components are compressed, where $S$ can be set by the user. After this, the remaining components are fed to the next stage as input to remove up to a further $S$ components and so on. To recover the removed components, the recovery algorithm is applied stagewise, similarly to the compression process, in the reverse direction. Note that upon passing the remaining components to the next stage, even though some of the remaining components are neighbours to removed components in the previous stage, it is possible to remove them in the next stage. This is because their ridge directions are not required until the corresponding stage in the recovery process, when these components will have been reconstructed by the previous recovery stage. In this way, the degree of compression can be increased.

Both the ridge compression and recovery algorithms presented in this section are greedy algorithms, which may therefore not result in the storage configuration that globally minimises the distance between the missing components and their neighbours. However, to determine the globally optimal solution requires a combinatorial search over every storage configuration, which is computationally prohibitive.

Note that the compression problem can be interpreted as a clustering task, where cluster centres are retained and all other ridges can be recovered by identifying each with the closest cluster or a linear combination of the two closest centres. Operating with a non-Euclidean metric defined by the subspace distance, clustering algorithms such as $k$ -medoids can be used. Section 3.2 describes an algorithm for clustering ridge directions using $k$ -medoids, based on its implementation in [32]. The corresponding recovery algorithm can be similar to Section 3.2. Alternatively, each removed component can be replaced by its nearest medoid. In Section 4.3, our algorithm is compared with $k$ -medoids for compressing the flow field of a CFD simulation case study.

4 Numerical examples

In this section, we illustrate the embedded ridge approximation approach with an analytical example and a CFD example.

4.1 Analytical example

Consider the function

[TABLE]

where

[TABLE]

defined over the domain $\mathcal{D}=[-1,1]^{10}$ where the inputs $\mathbf{x}$ are independent and have uniform marginals. Note that $h(\mathbf{x})$ is an exact ridge function with three ridge directions spanned by the columns of $\bm{U}=[\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3}]$ . We draw $\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3}$ as random vectors with unit Euclidean norm and compare the recovered ridge directions to the drawn vectors using the subspace distance (see (22)). In this example, polynomial variable projection (VP) is used for finding ridge directions, where the polynomials have a maximum total degree of 7. For the optimisation loop inside the algorithm for VP (see [15, Algorithm 4.1]), we set the convergence criterion to be when the subspace distance between the ridge directions of the previous and current iterations is smaller than $10^{-7}$ . Our implementation of this algorithm can be found in the Effective Quadratures open-source library [33] (https://www.effective-quadratures.org/).

For embedded ridge approximation, we use VP to estimate the ridge directions for each component function $f_{1}(\mathbf{x}),f_{2}(\mathbf{x})$ and $f_{3}(\mathbf{x})$ , and then calculate the first three leading eigenvectors of the vector gradient covariance matrix of $\mathbf{f}(\mathbf{x})=[f_{1}(\mathbf{x}),f_{2}(\mathbf{x}),f_{3}(\mathbf{x})]^{T}$ . The weights are set as $\bm{\omega}=[2,3,5]^{T}$ to find an estimate of the dimension-reducing subspace of $h(\mathbf{x})$ . For direct ridge approximation, we use VP to estimate the three-dimensional dimension-reducing subspace of $h(\mathbf{x})$ directly. We vary the number of observations used for each method and examine the subspace distance between the recovered directions and the true directions. We note that the results are binary—we either get a small subspace distance from successful recovery or a large subspace error from failure in recovery. Thus, we plot the probability of successful recovery—where the subspace distance is below 0.005—across 40 trials on the left of Figure 2. This plot shows that recovery using embedded ridge approximations is more stable and requires fewer observations than direct ridge approximation for a given recovery probability.

To achieve successful recovery of $\bm{U}$ from the embedded ridge approximation, we need to be able to successfully recover the ridge directions in each individual function, as reflected from the right plot of Figure 2. Interestingly, despite the need to successfully find three sets of ridge directions concurrently, the probability of recovery is still significantly higher for the embedded ridge approximation method. This is because the optimisation over three-dimensional subspaces required in the direct method is much more challenging than their one-dimensional counterparts required in the embedded method (see Table 3 in [15]).

4.2 Shape design of the NACA0012

We apply the embedded ridge function approximation algorithm (see Section 2.2) to the shape design of the NACA0012 airfoil. The shape deformation of the baseline NACA0012 profile is parameterised using $d=50$ Hicks-Henne bump functions around the airfoil, and the variation in the surface pressure profile is measured. We fix an entry Mach number of 0.3 (subsonic) and an angle of attack of 1.25°, with free-stream temperature and pressure at 273.15 K and 101325 Pa, respectively. The pressure profile is solved using the compressible Euler flow solver in the open source CFD suite $SU^{2}$ [34]. The coefficients of lift and drag121212Ignoring skin friction and assuming a unit reference area. are known to be linear functions of the pressure around the airfoil [2, Ch. 1], given by

[TABLE]

where the integral is evaluated around the airfoil surface, spatially parameterised by $s$ . The input variable $\mathbf{x}\in\mathbb{R}^{d}$ contains the Hicks-Henne bump amplitudes; $\mathbf{n}$ is the surface normal, $\mathbf{k}$ the direction perpendicular to the flow, and $\mathbf{j}$ the direction parallel to the flow (see Figure 3). In the normalising factors, $\rho_{0}$ is the free-stream density, and $v_{\infty}$ is the free-stream speed. We discretise this problem by considering $N=200$ measurements of pressure around the airfoil, resulting in the vector-valued function $\mathbf{p}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{N}$ representing the surface pressure profile. Note that the approximation in the second line of both expressions comes not only from the discretisation but also from the assumption that $\mathbf{n}$ is independent of $\mathbf{x}$ —a good approximation when the geometric perturbations are small. Under this approximation, the coefficients of lift and drag can then be expressed as linear functions of the components of $\mathbf{p(x)}$ .

As the flow is entirely subsonic and inviscid, we expect the bumps to have a strongly local influence. Hence, the pressure profile $\mathbf{p(x)}$ is well-approximated by an embedded ridge function. This motivates the following approach to estimate $C_{l}$ and $C_{d}$ , which applies the steps in Section 2.2 assuming each node is approximated by a one-dimensional ridge function.

Using a gradient-free computational strategy, estimate the leading ridge direction $\widehat{\mathbf{w}}_{i}$ for each $p_{i}(\mathbf{x})$ . 2. 2.

Fit a low-dimensional surrogate using this leading mode for each $p_{i}$ . That is, we seek

[TABLE]

for the $i$ -th component of $\mathbf{p}$ . We use univariate orthogonal polynomials for the profiles $\widehat{g}_{i}(\cdot)$ . 3. 3.

We can compute the elements of the Jacobian via these ridge approximations:

[TABLE]

where $\widehat{w}_{ji}$ is the $i$ -th element of $\widehat{\mathbf{w}}_{j}$ . Gradients here are furnished by the polynomial approximation analytically. 4. 4.

Compute the gradient covariance matrix with (18) by substituting $\bm{\omega}_{l}$ and $\bm{\omega}_{d}$ , from which we can compute the dimension-reducing subspaces and form ridge approximations of the scalar qois (the coefficients of lift and drag respectively).

4.2.1 Results

To apply the embedded ridge approximation approach, we will fit a one-dimensional ridge function for each component of the surface pressure profile, $p_{i}$ , and use a quadratic ridge profile for each component. Three gradient-free dimension-reducing strategies to find the ridge subspaces at each component are studied:

Fitting global linear models for each node, and taking the ridge direction as the normalised parameters of the linear model [11, Algorithm 1.3]—see Appendix C for further details. Note that the ridge profiles are still quadratic; the linear models are only used to find the ridge directions. This will be referred to as “Embedded linear”. 2. 2.

As above, but using quadratic polynomial VP only for nodes close to the leading edge, noting that pressure variation near the leading edge tends to be non-linear. The ridge subspace remains one-dimensional for all nodes. This will be referred to as “Embedded VP”. 3. 3.

As above, but using MAVE [23] only for nodes close to the leading edge to extract a one-dimensional ridge subspace. Then, a quadratic polynomial is fitted in this one-dimensional subspace for these nodes. This will be referred to as “Embedded MAVE”.

The implementation for VP is the same as in Section 4.1. For MAVE, we adapt the R code from Hang and Xia [35]. A brief exposition on the MAVE method is provided in Appendix D. Once the nodal ridge approximations are formulated, the ridge approximations for the qois $C_{l}$ and $C_{d}$ are furnished via Section 2.2.

The embedded ridge approximation approach is compared with the direct ridge approximation approach, where observations for $C_{l}$ and $C_{d}$ are used to find a ridge approximation directly and without the use of gradients. For the direct approach, three dimension-reducing strategies are studied—VP, MAVE and the linear model. For both the embedded and direct approaches, one dimension is used for the ridge approximation of $C_{l}$ , and two dimensions for $C_{d}$ . Note that the linear model used in the direct approach is unable to estimate more than one dimension, so only one is used for $C_{d}$ in this case.131313It is possible to construct surrogate models for $C_{l}$ and $C_{d}$ based on nodal ridge approximations alone, as noted in Section 2. However, as noted in the same section, the ridge subspace has utility on its own. Moreover, in this case study, evaluating the ridge approximation for the qois sets up a better comparison with direct ridge approximations.

In Figure 4, we plot the MSE of the surrogate model fitted using the dimension-reducing subspaces resulting from embedded and direct ridge approximation. The MSE of approximating the qoi $h(\mathbf{x})$ with $\widehat{h}(\mathbf{x})$ is evaluated as

[TABLE]

where $\mathbf{y}^{(j)}\in\mathbb{R}^{d}$ are verification samples drawn independently from data used to train the response surfaces. The ridge approximation of $h(\mathbf{x})$ evaluated at $\mathbf{y}^{(j)}$ is denoted $\widehat{h}(\mathbf{y}^{(j)})=\widehat{g}_{i}\left(\widehat{\bm{U}}^{T}\mathbf{y}^{(j)}\right)$ , and $\sigma_{h}^{2}$ is the sample variance of $h(\mathbf{y})$ across all $M^{\prime}$ verification samples.

It is shown that using embedded ridge approximation reduces the MSE compared to direct estimation when the number of samples is limited. The errors for embedded VP and MAVE approximately reach convergence in 300 observations for $C_{l}$ , and 400 observations for $C_{d}$ . Although the linear models (in both the direct and embedded cases) suffice for estimating $C_{l}$ , for functions with stronger non-linear dependencies such as $C_{d}$ , the linear model is shown to have a larger error compared to VP and MAVE. We also note that the use of embedded ridge approximation permits us to extend the capability of linear models to estimate more than one mode in the scalar qois, improving its performance as seen on the right of Figure 4.

4.3 Sparse storage of NACA0012 pressure field

In this subsection, we demonstrate the application of the ridge compression and recovery algorithms described in Section 3 on the estimation of the pressure field around the NACA0012 airfoil. The flow conditions and perturbation variables are set exactly the same as in the previous subsection. In the flow solution, the computational domain is discretised into $N=5233$ nodes. It is observed that each node in the flow field can be well-approximated by a one-dimensional ridge function. We examine the efficacy of our storage and recovery algorithms by selecting a subset of the components to store and attempting to recover the missing components from the stored ones.

We remove a range of numbers of components using the ridge compression algorithm (Section 3.2) applied recursively, where in each stage at most $S=500$ components are removed. Then, we reconstruct the missing ridge directions using the ridge recovery algorithm (Section 3.2), again applied recursively. The reconstruction quality is evaluated using the average normalised MSE $\epsilon_{R}$ , defined as

[TABLE]

where $N^{\prime}$ is the number of recovered components, and other variables are defined similarly as before. For comparison, the $k$ -medoids clustering algorithm (Section 3.2) is run under the same settings, with the same recovery algorithm (Section 3.2). In addition, a random deletion strategy is run, where the removed nodes are selected randomly, and missing modes are recovered by substituting the nearest neighbour in terms of subspace distance.

In Figure 5, the MSE averaged across all recovered components is plotted for the three methods: the ridge compression algorithm, $k$ -medoids clustering and random deletion. The plot shows that applying compression recursively allows recovery of missing ridge subspaces with greater accuracy than the other methods up to approximately 4700 components, which covers almost all of the nodes.

In Figure 6 and Figure 7, the $C_{p}$ profile on the surface of the airfoil and the entire flow field are compared for two cases: full CFD results and the reconstruction after removal of 3000 nodes using the compression algorithm respectively. The plots are for an airfoil geometry which was not used in the computation of the embedded ridge approximation. It can be seen that the pressure field is approximated well. Near the leading edge where large pressure variations are present, the pressure is also well-estimated. Figure 8 shows the locations of the removed nodes at different levels of compression. Nodes at the far-field are prioritised for removal, and, at high compression levels, nodes near the leading and trailing edges tend to be retained.

5 Conclusions

In this paper we introduce the notion of constructing ridge approximations over localised scalar fields, aimed at improved simulation-centric dimension reduction. Our ideas are born from a simple observation: in many PDE-based models, the scalar field at a certain node is only weakly influenced by far-field perturbations. It is more likely to be governed by locally induced perturbations—caused by variations in local boundary conditions or geometry. By interpreting global scalar qois as integrals of these scalar-field quantities, we hypothesise and demonstrate that constructing ridge approximations over individual scalar field nodes instead and then integrating them can reduce the number of computational evaluations required.

Acknowledgements

CYW acknowledges financial support from the Cambridge Trust, Jesus College, Cambridge, and the Alan Turing Institute. PS was funded through a Rolls-Royce postdoctoral fellowship. The authors would like to thank the anonymous reviewer for useful feedback, which helped improve the manuscript.

Appendix A Proof of Theorem 2.1

By the matrix Bernstein inequality, it can be shown that (21) implies that [30, Remark 6.5]

[TABLE]

We can then write

[TABLE]

where

[TABLE]

Note that

[TABLE]

because $\nabla g_{i}^{(m)}$ are identically distributed copies of $\nabla g_{i}$ and whose norms are upper bounded by $L$ . Using the triangle inequality and sub-multiplicativity of the norm, we get

[TABLE]

from which the theorem follows.

Appendix B Proof of Theorem 3.1

First, we prove a lemma establishing a connection between perturbation of subspaces and a perturbation in the associated basis matrices.

Lemma B.1.

Let $\mathcal{S}$ be an $r$ -dimensional subspace of $\mathbb{R}^{d}$ , and $\widetilde{\mathcal{S}}$ be an equidimensional perturbation of $\mathcal{S}$ . Then, $dist(\mathcal{S},\widetilde{\mathcal{S}})$ is bounded above if and only if there exists $\bm{W},\widetilde{\bm{W}}\in\mathbb{R}^{d\times r}$ with orthonormal columns such that $\mathbf{w}_{i}^{T}\widetilde{\mathbf{w}}_{i}$ is bounded below for all $i=1,...,r$ , where $\mathbf{w}_{i}$ is the $i$ -th column of $\bm{W}_{i}$ (and similarly for perturbed quantities).

Proof.

( $\Rightarrow$ ) Choose $\{\mathbf{w}_{1},...,\mathbf{w}_{r}\}$ and $\{\widetilde{\mathbf{w}}_{1},...,\widetilde{\mathbf{w}}_{r}\}$ to be a set of principal vectors of $\mathcal{S}$ and $\widetilde{\mathcal{S}}$ respectively. Then, by construction, we have

[TABLE]

where $\theta_{i}$ is the $i$ -th principal angle, with $0\leq\theta_{1}\leq...\leq\theta_{r}\leq\pi/2$ . Thus, $\mathbf{w}_{i}^{T}\widetilde{\mathbf{w}}_{i}$ is bounded below by $\cos(\theta_{r})$ , the cosine of the largest principal angle. The result follows from the fact that $\cos(\theta_{r})=\sqrt{1-dist(\mathcal{S},\widetilde{\mathcal{S}})^{2}}$ [31, Section 6.4.3].

( $\Leftarrow$ ) Note that

[TABLE]

for any $\bm{W},\widetilde{\bm{W}}\in\mathbb{R}^{d\times r}$ whose columns define orthonormal bases for $\mathcal{S}$ and $\widetilde{\mathcal{S}}$ respectively. That is, the distance does not depend on the basis chosen for the orthogonal projector. Then, we can write

[TABLE]

Hence, if $\mathbf{w}_{i}^{T}\widetilde{\mathbf{w}}_{i}\geq\cos(\theta_{r})$ then

[TABLE]

∎

Now, applying the Cauchy-Schwarz inequality to $\left(\mathbf{x}^{T}(\widetilde{\bm{W}}-\bm{W})\nabla_{\mathbf{u}}g(\bm{W}^{T}\mathbf{x})\right)^{2}$ in (24) yields the following:

[TABLE]

Let $\bm{A}=\widetilde{\bm{W}}-\bm{W}$ , $\mathbf{y}=\bm{A}^{T}\mathbf{x}$ and $\mathbf{a}_{i}$ be the $i$ -th column of $\bm{A}$ . Then,

[TABLE]

However, we have $\mathbf{a}_{i}=\widetilde{\mathbf{w}}_{i}-\mathbf{w}_{i}$ . So,

[TABLE]

where we have used the fact that $\widetilde{\mathbf{w}}_{i}^{T}\widetilde{\mathbf{w}}_{i}=\mathbf{w}_{i}^{T}\mathbf{w}_{i}=1$ . So, substituting (39) and (40) into (38), we have:

[TABLE]

Applying the sufficient part of Lemma B.1 completes the proof.

Appendix C Global linear models

The coefficients of a global linear model of a qoi give a rough estimation of the leading dimension-reducing subspace direction. If the qoi varies largely linearly in the input domain, and it is known that the dimension-reducing subspace is one-dimensional, this heuristic will find the required subspace at a low cost. This method is described with Algorithm 1.3 in [11], which we reproduce below.

Draw input/output pairs $\left(\mathbf{x}^{(m)},f\left(\mathbf{x}^{(m)}\right)\right)_{m=1}^{M}$ where $M=\alpha d$ with a certain oversampling $\alpha>1$ . 2. 2.

Solve the following least squares problem

[TABLE]

where the $m$ -th row of $\bm{X}$ is $\mathbf{x}^{(m)}$ and $y_{m}=f(\mathbf{x}^{(m)})$ . 3. 3.

Take the leading ridge direction to be $\mathbf{w}/\left\lVert\mathbf{w}\right\rVert_{2}$ .

Appendix D MAVE

The Minimum Average Variance Estimation (MAVE) method for sufficient dimension reduction is proposed by Xia et al. [23]. Given input/output pairs $(\mathbf{x}^{(m)},y^{(m)})_{m=1}^{M}$ , it aims to find the dimension-reducing subspace spanned by columns of $\bm{W}$ , which is the solution to the following optimisation over matrices with orthogonal columns:

[TABLE]

The first-order Taylor expansion about an input $\mathbf{x}^{(0)}$ of the expectation within the parentheses is

[TABLE]

Hence, the MAVE procedure minimises the following approximation to (43) as an alternating weighted least squares problem:

[TABLE]

where the weights $\omega_{ij}$ are determined through a normalised kernel function $K_{h}$ evaluated at $(\mathbf{x}_{i},\mathbf{x}_{j})$ , namely

[TABLE]

The procedure iterates with

•

fixing $a_{j},\mathbf{b}_{j}$ and optimizing with respect to $\bm{W}$ , and

•

fixing $\bm{W}$ and optimizing with respect to $a_{j},\mathbf{b}_{j}$ .

The minimiser for both steps can be expressed analytically, with details in, e.g., [36, Ch. 11].

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. A. S. Witteveen, A. Doostan, R. Pec̆nik, G. Iaccarino, Uncertainty quantification of the transonic flow around the RAE 2822 airfoil, Stanford University Centre of Turbulence Research, Annual Research Briefs (2009) 12.
2[2] J. D. Anderson, Fundamentals of Aerodynamics, 5th Edition, Mc Graw-Hill Education, New York, 2010.
3[3] P. Seshadri, G. T. Parks, S. Shahpar, Leakage Uncertainties in Compressors: The Case of Rotor 37 , Journal of Propulsion and Power 31 (1) (2015) 456–466. doi:10.2514/1.B 35039 . URL https://doi.org/10.2514/1.B 35039 · doi ↗
4[4] J. D. Denton, Loss Mechanisms in Turbomachines , in: Volume 2: Combustion and Fuels; Oil and Gas Applications; Cycle Innovations; Heat Transfer; Electric Power; Industrial and Cogeneration; Ceramics; Structures and Dynamics; Controls, Diagnostics and Instrumentation; IGTI Scholar Award, ASME, Cincinnati, Ohio, USA, 1993, p. V 002T 14A 001. doi:10.1115/93-GT-435 . URL http://proceedings.asmedigitalcollection.asme.org/proceeding.aspx?doi=10.1115/93-GT-435 · doi ↗
5[5] C. T. Sun, Mechanics of Aircraft Structures, 2nd Edition, John Wiley & Sons, Hoboken, N.J, 2006.
6[6] G. Blatman, B. Sudret, Adaptive sparse polynomial chaos expansion based on least angle regression , Journal of Computational Physics 230 (6) (2011) 2345–2367. doi:10.1016/j.jcp.2010.12.021 . URL https://linkinghub.elsevier.com/retrieve/pii/S 0021999110006856 · doi ↗
7[7] R. R. Lam, O. Zahm, Y. M. Marzouk, K. E. Willcox, Multifidelity Dimension Reduction via Active Subspaces , SIAM Journal on Scientific Computing 42 (2) (2020) A 929–A 956, publisher: Society for Industrial and Applied Mathematics. doi:10.1137/18M 1214123 . URL https://epubs.siam.org/doi/abs/10.1137/18M 1214123 · doi ↗
8[8] A. Pinkus, Ridge Functions, 1st Edition, Cambridge University Press, Cambridge, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

**Embedded Ridge Approximations111© 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ **

Abstract

keywords:

1 Introduction

2 Embedded ridge approximation

2.1 Interpretation as vector-valued dimension reduction

Definition 2.1** (Embedded ridge function).**

Definition 2.2** (Vector gradient covariance matrix).**

2.2 Algorithm for ridge computation

Theorem 2.1**.**

Proof.

3 Efficient storage of embedded ridge approximations: Ridge compression

3.1 A perturbation bound on the mean squared error

Theorem 3.1**.**

Proof.

3.2 Ridge compression and recovery

4 Numerical examples

4.1 Analytical example

4.2 Shape design of the NACA0012

4.2.1 Results

4.3 Sparse storage of NACA0012 pressure field

5 Conclusions

Acknowledgements

Appendix A Proof of Theorem 2.1

Appendix B Proof of Theorem 3.1

Lemma B.1**.**

Proof.

Appendix C Global linear models

Appendix D MAVE

Embedded Ridge Approximations111© 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Definition 2.1 (Embedded ridge function).

Definition 2.2 (Vector gradient covariance matrix).

Theorem 2.1.

Theorem 3.1.

Lemma B.1.