Multivariate approximation of functions on irregular domains by weighted   least-squares methods

Giovanni Migliorati

arXiv:1907.12304·math.NA·April 6, 2020

Multivariate approximation of functions on irregular domains by weighted least-squares methods

Giovanni Migliorati

PDF

TL;DR

This paper develops weighted least-squares algorithms for approximating functions on irregular domains, enabling stable, quasi-optimal estimations with computational efficiency even without explicit basis functions.

Contribution

It introduces a method to construct stable weighted least-squares estimators on irregular domains using surrogate bases, extending previous work to more complex geometries.

Findings

01

Stable estimators achieved with m ~ n log n function evaluations.

02

Surrogate basis construction depends on Christoffel function of domain and space.

03

Numerical experiments confirm theoretical stability and accuracy.

Abstract

We propose and analyse numerical algorithms based on weighted least squares for the approximation of a real-valued function on a general bounded domain $Ω \subset R^{d}$ . Given any $n$ -dimensional approximation space $V_{n} \subset L^{2} (Ω)$ , the analysis in [6] shows the existence of stable and optimally converging weighted least-squares estimators, using a number of function evaluations $m$ of the order $n lo g n$ . When an $L^{2} (Ω)$ -orthonormal basis of $V_{n}$ is available in analytic form, such estimators can be constructed using the algorithms described in [6,Section 5]. If the basis also has product form, then these algorithms have computational complexity linear in $d$ and $m$ . In this paper we show that, when $Ω$ is an irregular domain such that the analytic form of an $L^{2} (Ω)$ -orthonormal basis is not available, stable and quasi-optimally weighted…

Figures11

Click any figure to enlarge with its caption.

Equations179

E (∥ u - u_{n} ∥_{L^{2} (Ω, μ)}^{2}) \leq (1 + \frac{C _{2} ( 1 + ε n ) n}{m}) v \in V_{n} in f ∥ u - v ∥_{L^{2} (Ω, μ)}^{2} + \frac{C _{\infty} ( 1 + ε n ) n}{m} v \in V_{n} in f ∥ u - v ∥_{L^{\infty} (Ω)}^{2} + trunc .

E (∥ u - u_{n} ∥_{L^{2} (Ω, μ)}^{2}) \leq (1 + \frac{C _{2} ( 1 + ε n ) n}{m}) v \in V_{n} in f ∥ u - v ∥_{L^{2} (Ω, μ)}^{2} + \frac{C _{\infty} ( 1 + ε n ) n}{m} v \in V_{n} in f ∥ u - v ∥_{L^{\infty} (Ω)}^{2} + trunc .

P_{n} u := v \in V_{n} argmin ∥ u - v ∥,

P_{n} u := v \in V_{n} argmin ∥ u - v ∥,

k_{n} (y) := j = 1 \sum n ∣ L_{j} (y) ∣^{2}, w (y) := \frac{n}{k _{n} ( y )}, y \in Ω,

k_{n} (y) := j = 1 \sum n ∣ L_{j} (y) ∣^{2}, w (y) := \frac{n}{k _{n} ( y )}, y \in Ω,

d σ_{n} := w^{- 1} d μ = n^{- 1} j = 1 \sum n L_{j}^{2} d μ .

d σ_{n} := w^{- 1} d μ = n^{- 1} j = 1 \sum n L_{j}^{2} d μ .

⟨ u, v ⟩_{m} := \frac{1}{m} i = 1 \sum m w (y^{i}) u (y^{i}) v (y^{i}), u, v \in L^{2} (Ω, μ) .

⟨ u, v ⟩_{m} := \frac{1}{m} i = 1 \sum m w (y^{i}) u (y^{i}) v (y^{i}), u, v \in L^{2} (Ω, μ) .

u_{W}^{*} := j = 1 \sum n a_{j} L_{j} = v \in V_{n} argmin ∥ u - v ∥_{m},

u_{W}^{*} := j = 1 \sum n a_{j} L_{j} = v \in V_{n} argmin ∥ u - v ∥_{m},

G a = b,

G a = b,

∥ A ∥ := ∥ x ∥_{ℓ^{2}} = 1 max ∥ A x ∥_{ℓ^{2}}, κ (A) := \frac{λ _{m a x} ( A ^{⊤} A )}{λ _{m i n} ( A ^{⊤} A )},

∥ A ∥ := ∥ x ∥_{ℓ^{2}} = 1 max ∥ A x ∥_{ℓ^{2}}, κ (A) := \frac{λ _{m a x} ( A ^{⊤} A )}{λ _{m i n} ( A ^{⊤} A )},

∥ u ∥_{L^{\infty} (Ω)} \leq η .

∥ u ∥_{L^{\infty} (Ω)} \leq η .

u_{T}^{*} := T_{η} \circ u_{W}^{*} .

u_{T}^{*} := T_{η} \circ u_{W}^{*} .

m \geq \frac{n}{ξ ( δ )} ln (\frac{2 n}{α})

m \geq \frac{n}{ξ ( δ )} ln (\frac{2 n}{α})

Pr (∥ G - I ∥ > δ) \leq α,

Pr (∥ G - I ∥ > δ) \leq α,

E (∥ u - u_{T}^{*} ∥^{2}) \leq (1 + \frac{4}{ξ ( δ ) ln ( 2 n / α )}) e_{n} (u)^{2} + 4 α η^{2} .

E (∥ u - u_{T}^{*} ∥^{2}) \leq (1 + \frac{4}{ξ ( δ ) ln ( 2 n / α )}) e_{n} (u)^{2} + 4 α η^{2} .

⟨ u, v ⟩_{m} := \frac{1}{m} i = 1 \sum m u (y^{i}) v (y^{i}), u, v \in L^{2} (Ω, μ),

⟨ u, v ⟩_{m} := \frac{1}{m} i = 1 \sum m u (y^{i}) v (y^{i}), u, v \in L^{2} (Ω, μ),

j, k = 1 \sum n ∣ ⟨ L_{j}, L_{k} ⟩_{m} - δ_{j k} ∣^{2} \leq ε^{2} .

j, k = 1 \sum n ∣ ⟨ L_{j}, L_{k} ⟩_{m} - δ_{j k} ∣^{2} \leq ε^{2} .

k_{n} (y) := j = 1 \sum n ∣ L_{j} (y) ∣^{2}, w (y) := \frac{γ}{k _{n} ( y )}, y \in Ω,

k_{n} (y) := j = 1 \sum n ∣ L_{j} (y) ∣^{2}, w (y) := \frac{γ}{k _{n} ( y )}, y \in Ω,

d σ_{n} := w^{- 1} d μ = \frac{1}{γ} j = 1 \sum n L_{j}^{2} d μ,

d σ_{n} := w^{- 1} d μ = \frac{1}{γ} j = 1 \sum n L_{j}^{2} d μ,

u_{W} := P_{n}^{m} u := v \in V_{n} argmin ∥ u - v ∥_{m} .

u_{W} := P_{n}^{m} u := v \in V_{n} argmin ∥ u - v ∥_{m} .

G a = b,

G a = b,

u_{T} := T_{η} \circ u_{W} .

u_{T} := T_{η} \circ u_{W} .

Z_{ε} := ⎩ ⎨ ⎧ y^{1}, \dots, y^{m} \in Ω : j, k = 1 \sum n ∣ ⟨ L_{j}, L_{k} ⟩_{m} - δ_{j k} ∣^{2} \leq ε^{2} ⎭ ⎬ ⎫,

Z_{ε} := ⎩ ⎨ ⎧ y^{1}, \dots, y^{m} \in Ω : j, k = 1 \sum n ∣ ⟨ L_{j}, L_{k} ⟩_{m} - δ_{j k} ∣^{2} \leq ε^{2} ⎭ ⎬ ⎫,

W_{Ω} := {y^{1}, \dots, y^{m} \in Ω : span (L_{1}, \dots, L_{n}) = V_{n}} .

W_{Ω} := {y^{1}, \dots, y^{m} \in Ω : span (L_{1}, \dots, L_{n}) = V_{n}} .

K_{n} = K_{n} (Ω) := y \in Ω sup k_{n} (y),

K_{n} = K_{n} (Ω) := y \in Ω sup k_{n} (y),

K_{n} \leq n^{2} .

K_{n} \leq n^{2} .

N_{δ} := {y^{1}, \dots, y^{m} \in Ω s.t. u \in V_{n} ⋂ {(1 - δ) ∥ u ∥^{2} \leq ∥ u ∥_{m}^{2} \leq (1 + δ) ∥ u ∥^{2}}} .

N_{δ} := {y^{1}, \dots, y^{m} \in Ω s.t. u \in V_{n} ⋂ {(1 - δ) ∥ u ∥^{2} \leq ∥ u ∥_{m}^{2} \leq (1 + δ) ∥ u ∥^{2}}} .

m \geq \frac{K _{n}}{ξ ( δ )} ln (\frac{2 n}{α}),

m \geq \frac{K _{n}}{ξ ( δ )} ln (\frac{2 n}{α}),

K_{n} \leq λ^{- 1} n^{2},

K_{n} \leq λ^{- 1} n^{2},

Pr (∥ G - I ∥ \geq δ + ε) \leq α + β;

Pr (∥ G - I ∥ \geq δ + ε) \leq α + β;

E (∥ u - u_{T} ∥^{2}) \leq (1 + τ_{2} (n)) e_{n} (u)^{2} + τ_{\infty} (n) e_{n}^{\infty} (u)^{2} + 8 η^{2} (α + β),

E (∥ u - u_{T} ∥^{2}) \leq (1 + τ_{2} (n)) e_{n} (u)^{2} + τ_{\infty} (n) e_{n}^{\infty} (u)^{2} + 8 η^{2} (α + β),

τ_{2} (n) := \frac{1 + ε ( n + 1 )}{1 - δ} \frac{δ ^{2} ( 1 + ε )}{4 ( 1 - δ - ε ) ^{2} ln ( 2 n / α )}, τ_{\infty} (n) := \frac{1 + ε ( n + 1 )}{1 - δ} \frac{ξ ( δ ) ( 1 + ε ) n}{( 1 - δ - ε ) ^{2} K _{n} ln ( 2 n / α )} .

τ_{2} (n) := \frac{1 + ε ( n + 1 )}{1 - δ} \frac{δ ^{2} ( 1 + ε )}{4 ( 1 - δ - ε ) ^{2} ln ( 2 n / α )}, τ_{\infty} (n) := \frac{1 + ε ( n + 1 )}{1 - δ} \frac{ξ ( δ ) ( 1 + ε ) n}{( 1 - δ - ε ) ^{2} K _{n} ln ( 2 n / α )} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Multivariate approximation of functions on irregular domains

by weighted least-squares methods

Giovanni Migliorati Sorbonne Université, UPMC Univ Paris 06, CNRS, UMR 7598, Laboratoire Jacques-Louis Lions, 4, place Jussieu 75005, Paris, France. email: [email protected]

Abstract

We propose and analyse numerical algorithms based on weighted least squares for the approximation of a bounded real-valued function on a general bounded domain $\Omega\subset\mathbb{R}^{d}$ . Given any $n$ -dimensional approximation space $V_{n}\subset L^{2}(\Omega)$ , the analysis in [6] shows the existence of stable and optimally converging weighted least-squares estimators, using a number of function evaluations $m$ of the order $n\ln n$ . When an $L^{2}(\Omega)$ -orthonormal basis of $V_{n}$ is available in analytic form, such estimators can be constructed using the algorithms described in [6, Section 5]. If the basis also has product form, then these algorithms have computational complexity linear in $d$ and $m$ . In this paper we show that, when $\Omega$ is an irregular domain such that the analytic form of an $L^{2}(\Omega)$ -orthonormal basis is not available, stable and quasi-optimally weighted least-squares estimators can still be constructed from $V_{n}$ , again with $m$ of the order $n\ln n$ , but using a suitable surrogate basis of $V_{n}$ orthonormal in a discrete sense. The computational cost for the calculation of the surrogate basis depends on the Christoffel function of $\Omega$ and $V_{n}$ . Numerical results validating our analysis are presented.

1 Introduction and overview of the paper

Approximating an unknown function from its pointwise evaluations is a classical problem in mathematics. Interpolation and least squares are two approaches to such a problem, see e.g. [7, 13]. In this paper, we develop and analyse numerical methods based on least squares for the approximation of a bounded function $u:\Omega\to\mathbb{R}$ on a general bounded domain $\Omega\subset\mathbb{R}^{d}$ in any dimension $d$ , that can be a challenging task due to the curse of dimensionality. Approximation takes place in $L^{2}(\Omega,\mu)$ , the space of square-integrable functions with respect to $\mu:=\mu(\Omega)$ , the uniform probability measure on $\Omega$ . Given a finite $n$ -dimensional linear space $V_{n}\subset L^{2}(\Omega,\mu)$ , projection-type numerical methods select $u_{n}\in V_{n}$ that minimizes the approximation error of $u$ in $V_{n}$ . Standard least squares are an example of such numerical methods, that construct $u_{n}$ from pointwise evaluations of $u$ at $m>n$ iid random samples from $\mu$ . An important point in the analysis of least squares concerns how large $m$ has to be, compared to $n$ , to ensure stability and good approximation properties of the estimator $u_{n}$ .

Recent works [9, 10, 6] have pointed out weighted least-squares methods as a well-promising approach for approximation in arbitrary dimension $d$ . In any domain $\Omega\subseteq\mathbb{R}^{d}$ and with any finite-dimensional space $V_{n}\subset L^{2}(\Omega,\mu)$ , it was shown in [6] that weighted least-squares estimators $u_{n}\in V_{n}$ are stable and optimally converging in expectation, when the $m$ evaluations of $u$ are taken at iid random samples from a suitable probability measure $\sigma_{n}=\sigma_{n}(\Omega)$ that depends on $V_{n}$ and $\mu$ , and with $m$ being only linearly proportional to $n$ up to a logarithmic term, and independent of the ambient dimension $d$ . This result is recalled in Theorem 1.

For the computation of $u_{n}$ with the above guarantees, the analytic expression of an $L^{2}(\Omega,\mu)$ -orthonormal basis $(L_{j})_{j\geq 1}$ is needed. If this is available, then one can generate the random samples from $\sigma_{n}$ and construct $u_{n}$ as described in [6], and as recalled in Section 2 as well. Moreover, if the orthonormal basis has product form, like e.g. when $\Omega$ is a product domain, then the numerical methods developed in [6] generate random samples from $\sigma_{n}$ at a computational cost that scales linearly in both $d$ and $m$ .

In general, when $\Omega$ is an irregular domain, the analytic expression of an $L^{2}(\Omega,\mu)$ -orthonormal basis is not known. Hence a suitable surrogate basis $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ of $V_{n}$ is needed, that replaces $L_{1},\ldots,L_{n}$ and at the same time retains orthogonality with respect to some scalar product easy to evaluate on any domain $\Omega$ . In this setting, a convenient choice is to orthonormalize $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ using a discrete scalar product with iid random samples from $\mu$ . We denote by $\widetilde{u}_{n}\in V_{n}$ the new weighted least-squares estimator of $u$ computed using the basis $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ , that differs from $u_{n}$ whose computation uses $L_{1},\ldots,L_{n}$ . In the present paper we show that the estimator $\widetilde{u}_{n}$ can be constructed on general domains $\Omega$ and that:

•

$\widetilde{u}_{n}$ is stable with high probability, quasi-optimally converging in expectation, and uses a number $m$ of evaluations of $u$ only linearly proportional to $n$ up to a logarithmic term, and independent of $d$ ;

•

the numerical construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ requires $\widetilde{m}$ iid random samples from $\mu$ and the QR factorisation of $\widetilde{m}$ -by- $n$ matrices, where $\widetilde{m}$ scales as the $L^{\infty}(\Omega)$ norm of the reciprocal of the Christoffel function of $V_{n}$ on $\Omega$ . Such a construction does not use any evaluation of $u$ , nor does it require the knowledge of $L_{1},\ldots,L_{n}$ .

The novel stability and convergence result for the estimator $\widetilde{u}_{n}$ are stated in Theorem 3, whose proof uses previous results from [5, 6] and matrix Bernstein inequality. The convergence estimate reads as follows, where we use the $L^{2}(\Omega,\mu)$ and $L^{\infty}(\Omega)$ best approximation errors of $u$ in $V_{n}$ , a parameter $\varepsilon\geq 0$ related to the construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ , two unnamed constants $C_{2},C_{\infty}>0$ , and omit the technical details on the truncation of the estimator:

[TABLE]

The parameters $m$ and $\widetilde{m}$ essentially scale linearly and superlinearly in $n$ , respectively.

The term $\varepsilon n$ arises from the missing discrete orthogonality of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ , that occurs in any orthonormalisation process due to numerical cancellation. When $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ are constructed by Householder QR factorisation, $\varepsilon$ provably does not exceed $\epsilon_{M}\widetilde{m}n^{3/2}$ where $\epsilon_{M}\approx 10^{-16}$ is the machine precision of arithmetic calculations, and thus the term $\varepsilon n$ is completely negligible for wide ranges of $n$ and $\widetilde{m}$ .

The construction of the estimator $\widetilde{u}_{n}$ uses $m$ evaluations of $u$ at iid random samples drawn from a surrogate discrete probability measure $\widetilde{\sigma}_{n}=\widetilde{\sigma}_{n}(\Omega)$ that emulates $\sigma_{n}$ , and that depends on $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ and $\mu$ . The random samples from $\widetilde{\sigma}_{n}$ can be generated by subsampling the QR factorisation of a suitable matrix that depends on $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ . Similar applications of QR factorisation have been used for the computation of Fekete points [19], and for the construction of randomised quadratures [16].

In [1] a different method based on SVD truncation has been analysed, for the same purposes of function approximation on irregular domains. For that method, similar error estimates as (1.1) have been obtained in [1], but requiring a number $m$ of function evaluations that scales superlinearly in $n$ , and using a different best approximation error that depends on the SVD truncation parameter.

In [2] a method similar to our forthcoming Algorithm 1 has been proposed, and its convergence in probability has been analysed assuming exact discrete orthonormality of the surrogate basis, i.e. assuming $\varepsilon=0$ .

The structure of our paper is the following: in Section 2 we recall from [6] some results on approximation by weighted least-squares methods, and describe the additional challenges encountered when applying such methods to irregular domains. In Section 2.1 we state Theorem 3. Its proof is postponed to Section 2.2. In Section 2.3 we describe the construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ . In Section 3 we propose two algorithms (Algorithm 1 and Algorithm 2) that compute the weighted least-squares estimator $\widetilde{u}_{n}$ . Section 4 contains some numerical tests that validate our analysis. In Section 5 we draw some conclusions.

2 Weighted least-squares approximation on irregular domains

Given a bounded domain $\Omega\subset\mathbb{R}^{d}$ , we consider the problem of approximating a bounded function $u:\Omega\to\mathbb{R}$ from its pointwise evaluations at independent random samples uniformly distributed over $\Omega$ . Without loss of generality we suppose that $\Omega\subseteq B:=[-1,1]^{d}$ . Denote with $\mu=\mu(\Omega)$ the uniform probability measure on $\Omega$ , and with $(L_{j})_{j\geq 1}$ an orthonormal basis of $L^{2}(\Omega,\mu)$ , where the $L^{2}(\Omega,\mu)$ norm is denoted as $\|u\|:=\sqrt{\langle u,u\rangle}$ and $\langle u,v\rangle:=\int_{\Omega}uv\,d\mu$ for any $u,v\in L^{2}(\Omega,\mu)$ . The Euclidean norm in $\mathbb{R}^{n}$ is indicated with $\ell^{2}$ . For any $n\geq 1$ , we denote by $V_{n}:=\textrm{span}(L_{1},\ldots,L_{n})\subset L^{2}(\Omega,\mu)$ an $n$ -dimensional approximation space, and assume that $V_{n}$ contains the constant functions. We define the $L^{2}(\Omega,\mu)$ -projection of $u$ on $V_{n}$ as

[TABLE]

and denote by $e_{n}(u):=\|u-P_{n}u\|$ the $L^{2}(\Omega,\mu)$ best approximation error of $u$ in $V_{n}$ . We also denote by $e_{n}^{\infty}(u):=\inf_{v\in V_{n}}\|u-v\|_{L^{\infty}(\Omega)}$ the best approximation error in $L^{\infty}$ . Using $L_{1},\ldots,L_{n}$ , we define the functions

[TABLE]

and the probability measure $\sigma_{n}$ on $\Omega$ as

[TABLE]

When $V_{n}$ is the total degree polynomial space, $k_{n}^{-1}$ is known as the Christoffel function, see e.g. [17]. For any choice $y^{1},\ldots,y^{m}\in\Omega$ of $m$ points, we introduce the scalar product

[TABLE]

In general the exact projection (2.2) cannot be computed. This motivates the interest in the discrete least-squares approach, where the $L^{2}(\Omega,\mu)$ norm in (2.2) is replaced by the seminorm induced on $L^{2}(\Omega,\mu)$ by the scalar product (2.4). Define the weighted least-squares estimator

[TABLE]

that can be computed from the minimal $\ell^{2}$ -norm solution $a=(a_{1},\ldots,a_{n})^{\top}\in\mathbb{R}^{n}$ to the normal equations

[TABLE]

where the Grammian matrix $G\in\mathbb{R}^{n\times n}$ is defined component-wise as $G_{jk}:=\langle L_{j},L_{k}\rangle_{m}$ and the right-hand side $b\in\mathbb{R}^{n}$ has components $b_{j}=\langle L_{j},u\rangle_{m}$ . Throughout the paper, $I\in\mathbb{R}^{n\times n}$ denotes the identity matrix, and

[TABLE]

denotes the spectral norm, respectively the condition number, of any matrix $A\in\mathbb{R}^{m\times n}$ . For any $\delta\geq 0$ define $\xi(\delta):=(1+\delta)\ln(1+\delta)-\delta>0$ , that can be sandwiched as $(2\ln(2)-1)\delta^{2}\leq\xi(\delta)\leq\delta^{2}/2$ when $\delta\in[0,1]$ . As in [6], we suppose that $u\in L^{2}(\Omega,\mu)$ satisfies a uniform bound with some known $\eta>0$ :

[TABLE]

We then introduce the truncation operator $z\mapsto T_{\eta}(z):=\textrm{sign}(z)\min\{|z|,\eta\}$ , and define the truncated estimator

[TABLE]

The following result was proven in [6, Theorem 2.1 and Corollary 2.2], in a slightly different form (here we rewrite it with $\alpha=2m^{-r}$ , where $r>0$ is the same parameter as in [6]).

Theorem 1.

In any dimension $d$ , for any domain $\Omega\subseteq\mathbb{R}^{d}$ and any $\alpha,\delta\in(0,1)$ , if

[TABLE]

and $y^{1},\ldots,y^{m}\in\Omega$ are $m$ iid random samples from $\sigma_{n}$ then

[TABLE]

and if $u\in L^{2}(\Omega,\mu)$ satisfies (2.7) then the estimator $u_{T}^{*}$ satisfies

[TABLE]

The above result holds with any bounded or unbounded domain $\Omega$ in any dimension, and in general approximation spaces $V_{n}$ . In practice, the computation of the estimator $u_{T}^{*}$ requires the analytic expression of an $L^{2}(\Omega,\mu)$ -orthonormal basis $L_{1},\ldots,L_{n}$ , for the generation of the random samples from (2.3) and for the construction of $G$ and $b$ in (2.6). When $\Omega=[-1,1]^{d}$ , many $L^{2}(\Omega,\mu)$ -orthonormal basis can be constructed by tensorization, e.g. tensorized Legendre polynomials or tensorized wavelets. Other examples are available when $\Omega$ has a symmetric structure, e.g. spherical harmonics on the sphere $\Omega=\{y\in\mathbb{R}^{d}\,:\,\|y\|_{\ell^{2}}=1\}$ .

In general, when $\Omega$ is an irregular domain, the analytic expression of the $L_{j}$ is not known. This introduces additional challenges in the development and analysis of projection-type numerical methods for approximation on irregular domains. In principle, candidate replacements of $L_{1},\ldots,L_{n}$ are functions $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}\in V_{n}$ not necessarily orthonormal in $L^{2}(\Omega,\mu)$ that satisfy the following prescriptions:

P1)

$\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ be orthonormal w.r.t. a discrete scalar product, that can be easily evaluated with any domain $\Omega$ , in contrast to the $L^{2}(\Omega,\mu)$ scalar product that requires integration over $\Omega$ ; 2. P2)

$\textrm{span}(\widetilde{L}_{1},\ldots,\widetilde{L}_{n})=V_{n}$ , since our goal is the approximation of $u$ in the space $V_{n}$ .

We now introduce some tools useful for the numerical construction of the basis $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ . Let $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\in\Omega$ be $\widetilde{m}$ iid random samples from $\mu$ , and define the discrete scalar product

[TABLE]

and $\|u\|_{\widetilde{m}}:=\sqrt{\langle u,u\rangle_{\widetilde{m}}}$ . For any $\varepsilon\geq 0$ , we say that $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ are $\varepsilon$ -orthonormal if

[TABLE]

Orthonormalisation algorithms, e.g. Gram Schmidt-type or factorization-type algorithms, try to construct a set $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}\in V_{n}$ of functions orthonormal w.r.t (2.8), but they suffer from loss of orthogonality due to numerical cancellation. As a consequence, the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ constructed by any such numerical method are only $\varepsilon$ -orthonormal for some $\varepsilon>0$ , and prescription P1 can be fulfilled with the scalar product (2.8) up to a (hopefully small) loss of orthogonality quantified by (2.9).

Let us consider prescription P2. Define $\widetilde{V}_{n}:=\textrm{span}(\widetilde{L}_{1},\ldots,\widetilde{L}_{n})$ and denote with $\varphi_{1},\ldots,\varphi_{n}\in V_{n}$ a collection of $n$ functions such that $\textrm{span}(\varphi_{1},\ldots,\varphi_{n})=V_{n}$ . These functions need not be orthonormal to a scalar product, but only linearly independent. Orthonormalisation algorithms construct each $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ as linear combinations of $\varphi_{1},\ldots,\varphi_{n}$ , ensuring $\widetilde{V}_{n}\subseteq V_{n}$ . The coefficients of the linear combinations are computed from evaluations of $\varphi_{1},\ldots,\varphi_{n}$ at $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ . Although linearly independent, the $\varphi_{i}$ and $\varphi_{j}$ with $i\neq j$ could be indistinguishable when evaluated at $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ , and when this happens the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ do not span the whole $V_{n}$ . Due to randomness in the $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ , in general spaces $V_{n}$ one can ensure P2 only with large probability. When $V_{n}$ is a polynomial space, in Section 2.3 we show that P2 can be ensured with probability one.

For the time being we suppose that an $\varepsilon$ -orthonormal basis $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ is available for some $\varepsilon>0$ . A concrete algorithm for the construction of such a basis is described in Section 2.3, together with suitable bounds for $\varepsilon$ . Using $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ , define the functions

[TABLE]

where $\gamma>0$ is a normalisation term defined later. Consider the set $\widetilde{\Omega}:=\{\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\}\subset\Omega$ containing $\widetilde{m}$ iid random samples from $\mu$ , and define the discrete uniform probability measure $\widetilde{\mu}$ on $\widetilde{\Omega}$ (i.e. $\widetilde{\mu}(\widetilde{y}^{i})=\widetilde{m}^{-1}$ for all $i=1,\ldots,\widetilde{m}$ ) and the probability measure $\widetilde{\sigma}_{n}$ on $\widetilde{\Omega}$ as

[TABLE]

with $\gamma:=\widetilde{m}^{-1}\sum_{i=1}^{\widetilde{m}}\sum_{j=1}^{n}(\widetilde{L}_{j}(\widetilde{y}^{i}))^{2}=\sum_{j=1}^{n}\langle\widetilde{L}_{j},\widetilde{L}_{j}\rangle_{\widetilde{m}}$ .

Let $y^{1},\ldots,y^{m}\in\Omega$ be $m$ iid random samples from $\widetilde{\sigma}_{n}$ . Using these random samples and the scalar product (2.4) with the weight $w$ chosen as in (2.10), we define the Grammian matrix $\widetilde{G}\in\mathbb{R}^{n\times n}$ with components $\widetilde{G}_{jk}:=\langle\widetilde{L}_{j},\widetilde{L}_{k}\rangle_{m}$ , and the vector $\widetilde{b}\in\mathbb{R}^{n}$ with components $\widetilde{b}_{j}=\langle\widetilde{L}_{j},u\rangle_{m}$ . We now introduce the discrete projection $P_{n}^{m}$ on $\widetilde{V}_{n}$ and the weighted least-squares estimator $u_{W}$ as

[TABLE]

The estimator $u_{W}$ can be computed by solving the normal equations

[TABLE]

whose solution $a=(a_{1},\ldots,a_{n})^{\top}\in\mathbb{R}^{n}$ provides the coefficients of the expansion $u_{W}=\sum_{j=1}^{n}a_{j}\widetilde{L}_{j}$ . Denote with $u_{T}$ the truncated estimator

[TABLE]

Define $\Omega_{m}:=\overbrace{\Omega\times\cdots\times\Omega}^{m\textrm{ times}}$ . Throughout the paper, all the probability events belong to the Borel $\sigma$ -algebra $\mathcal{B}(\Omega_{m+\widetilde{m}})$ and $\Pr$ denotes the probability measure $(\otimes^{m}d\widetilde{\sigma}_{n})\otimes(\otimes^{\widetilde{m}}d\mu)$ on $\Omega_{m+\widetilde{m}}$ . The only exceptions are in Theorem 1 that uses $\mathcal{B}(\Omega_{m})$ and $\Pr$ as $\otimes^{m}d\sigma_{n}$ on $\Omega_{m}$ , and in the forthcoming Theorem 2 that uses $\mathcal{B}(\Omega_{\widetilde{m}})$ and $\Pr$ as $\otimes^{\widetilde{m}}d\mu$ on $\Omega_{\widetilde{m}}$ .

The following probability events are related to the construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ satisfying P1 and P2:

[TABLE]

In the notation $\mathcal{W}_{\Omega}$ , the subscript points out the dependence on $\Omega$ in the construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ , further discussed in Section 2.3. Notice that both events $\mathcal{Z}_{\varepsilon}$ and $\mathcal{W}_{\Omega}$ do not depend on $y^{1},\ldots,y^{m}$ .

We define the quantity

[TABLE]

that in general depends on $\Omega$ and $V_{n}$ . Thanks to the inclusion $L^{\infty}(\Omega)\subset L^{2}(\Omega,\mu)$ on any $\Omega$ bounded, the lower bound $K_{n}\geq n$ holds for any $n\geq 1$ . When $\Omega=[-1,1]^{d}$ and $V_{n}$ is a downward closed polynomial space, we also have the following upper bound from [4] for any $n\geq 1$ :

[TABLE]

For any $\widetilde{\delta}\in[0,1]$ , define the probability event:

[TABLE]

The next result was proven in [5] in a slightly different form, that we rewrite with $\alpha=2m^{-r}$ , as in Theorem 1.

Theorem 2.

In any dimension $d$ , for any bounded $\Omega\subset\mathbb{R}^{d}$ , for any $\alpha>0$ , $\widetilde{\delta}\in(0,1)$ and $n\geq 1$ , if

[TABLE]

and $\widetilde{y}_{1},\ldots,\widetilde{y}_{\widetilde{m}}$ are iid random samples from $\mu$ then $\Pr\left(\mathcal{N}_{\widetilde{\delta}}\right)>1-\alpha$ .

It has been observed in [1] that the upper bound (2.14) can be generalised to bounded domains with the so-called $\lambda$ -rectangle property, i.e. $\Omega$ has the $\lambda$ -rectangle property if $\exists\lambda\in(0,1)$ such that $\Omega=\bigcup_{R\in\mathcal{R}}R$ , where $\mathcal{R}$ is the set of (possibly overlapping) hyperrectangles $R\subseteq\Omega$ such that $\inf_{R\in\mathcal{R}}\textrm{Vol}(R)=\lambda\textrm{Vol}(\Omega)$ .

If the domain $\Omega$ has the $\lambda$ -rectangle property and $V_{n}$ is a downward closed polynomial space then

[TABLE]

see [1, Theorem 6.6]. Simple domains that do not have the $\lambda$ -rectangle property are e.g. the simplex and the ball. When $\Omega$ is a convex or starlike domain and $V_{n}$ is a total degree polynomial space, asymptotic upper bounds for $K_{n}$ are available e.g. in [3, 14, 15, 22], see also [18] for estimates of $K_{n}$ when $d=2$ . With more general domains $\Omega$ and/or approximation spaces $V_{n}$ , finding upper bounds for $K_{n}(\Omega)$ is an open problem.

2.1 Main results

This section contains Theorem 3 and the analysis of a numerical algorithm that constructs $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ . Theorem 3 states conditions ensuring that with large probability $\widetilde{G}$ stays close to the identity matrix in spectral norm, and that the estimator $u_{T}$ quasi-optimally converges in expectation, when the $\widetilde{L}_{j}$ are $\varepsilon$ -orthonormal. Theorem 3 applies in general to any orthonormalisation algorithm. Its proof is postponed to Section 2.2. In Theorem 3 we assume that $\Pr(\mathcal{Z}_{\varepsilon}\cap\mathcal{W}_{\Omega})\geq 1-\beta$ for some $\beta\in[0,\frac{1}{2})$ . This assumption means that, with probability at least $1-\beta$ , the chosen orthonormalisation algorithm can construct $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ that are $\varepsilon$ -orthonormal and span the whole $V_{n}$ , using $\widetilde{m}$ random samples $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ . In this respect, $\beta=\beta(\varepsilon,\Omega,\widetilde{m})$ represents the failure probability of the orthonormalisation algorithm. In some settings $\beta$ is known from the analysis, see Section 2.3, and if not, in any case, it can be numerically estimated for the given domain $\Omega$ and threshold $\varepsilon$ .

In Section 2.3 we discuss an orthonormalisation algorithm based on Householder QR factorisation, which constructs $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}\in V_{n}$ provably $\varepsilon$ -orthonormal with $\varepsilon\approx\epsilon_{M}\widetilde{m}n^{3/2}$ , and achieves $\beta=0$ when $V_{n}$ is a multivariate polynomial space. Corollary 1 contains the application of Theorem 3 to such an algorithm.

Theorem 3.

In any dimension $d$ , for any bounded domain $\Omega\subset B$ , for any $\alpha,\beta\in[0,\frac{1}{2})$ , $\varepsilon\in[0,1)$ , $\delta\in(0,1-\varepsilon)$ , $\widetilde{\delta}\in(0,1)$ and $n\geq 1$ , if the following conditions hold true

i)

$m\geq\dfrac{4n(1+\varepsilon)}{\delta^{2}}\ln\left(\dfrac{2n}{\alpha}\right)$ , 2. ii)

$\widetilde{m}\geq\dfrac{K_{n}}{\xi(\widetilde{\delta})}\ln\left(\dfrac{2n}{\alpha}\right)$ , 3. iii)

$\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\stackrel{{\scriptstyle\textrm{iid}}}{{\sim}}\mu$ , 4. iv)

$y^{1},\ldots,y^{m}\stackrel{{\scriptstyle\textrm{iid}}}{{\sim}}\widetilde{\sigma}_{n}$ , 5. v)

$\Pr\left(\mathcal{Z}_{\varepsilon}\cap\mathcal{W}_{\Omega}\right)\geq 1-\beta$ ,

then

I)

the matrix $\widetilde{G}$ satisfies

[TABLE] 2. II)

if $u\in L^{2}(\Omega,\mu)$ satisfies (2.7) then the estimator $u_{T}$ satisfies

[TABLE]

where

[TABLE]

Remark 1 (Comparison with Theorem 1).

Theorem 1 and Theorem 3 prove that $G$ and $\widetilde{G}$ are well-conditioned, respectively, when $m$ is of the order $n\ln n$ , but with differently distributed random samples. In the proof of (2.16), $\widetilde{m}$ does not need to satisfy ii), and only needs to ensure a large probability of the event $\mathcal{Z}_{\varepsilon}\cap\mathcal{W}_{\Omega}$ in v). Condition ii) is needed for the proof of (2.17).

The convergence estimates in Theorem 1 and Theorem 3 differ due to term $\varepsilon n$ , whose presence is discussed in Remark 2, and due to the $L^{\infty}$ -best approximation error, whose coefficient satisfies $\tau_{\infty}(n)\leq\tau_{2}(n)$ for any $n\geq 1$ such that $K_{n}\geq 2n$ . If $\widetilde{m}$ satisfies ii) with $K_{n}$ replaced by $\max\{K_{n},n^{2}\}$ , then $\tau_{\infty}$ decays to zero as $\varepsilon/\ln n$ .

Remark 2 (Missing orthogonality of the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ ).

In the proof of (2.17), the additional term $n\varepsilon$ in (2.28) arises from the fact that $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ are only $\varepsilon$ -orthonormal with $\varepsilon>0$ . The term $n\varepsilon$ propagates to $\tau_{2}$ and $\tau_{\infty}$ in (2.17), and is harmless as long as $\varepsilon$ remains small. This is the case for wide ranges of $n$ and $\widetilde{m}$ since $\varepsilon$ provably does not exceed $\epsilon_{M}\widetilde{m}n^{3/2}$ and $\epsilon_{M}\approx 10^{-16}$ , see Section 2.3. For example, if $\widetilde{m}=10^{6}$ and $n=10^{3}$ then $\varepsilon\approx 10^{-6}$ . The numerical tests in Section 4 show that even lower values of $\varepsilon$ can be taken, of the order $10^{-12}$ .

If $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ are assumed $\varepsilon$ -orthonormal with $\varepsilon=0$ then, by Parseval’s identity, (2.28) simplifies to

[TABLE]

and the same proof of item II) gives (2.17) with

[TABLE]

being strictly decreasing functions that tend to zero as $n\to+\infty$ . Notice that $K_{n}\geq n$ .

2.2 Proofs and intermediate results

Given two events $X,Y$ such that $\Pr(Y)>0$ , we denote by $\Pr(X|Y):=\Pr(X\cap Y)/\Pr(Y)$ the conditional probability of $X$ given $Y$ .

Proof of item I) in Theorem 3.

For convenience we define the events $\mathcal{A}_{\varepsilon,\Omega}:=\mathcal{Z}_{\varepsilon}\cap\mathcal{W}_{\Omega}$ , $\mathcal{B}_{\delta,\varepsilon}:=\{\|\widetilde{G}-I\|<\delta+\varepsilon\}$ , $\mathcal{C}_{\delta}:=\{\|\widetilde{G}-\mathbb{E}(\widetilde{G})\|<\delta\}$ and $\mathcal{D}_{\varepsilon}:=\{\|\mathbb{E}(\widetilde{G})-I\|\leq\varepsilon\}$ . The expectation is on the $y^{1},\ldots,y^{m}$ , for given $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ . Indeed $\|\widetilde{G}-I\|\leq\|\widetilde{G}-\mathbb{E}(\widetilde{G})\|+\|\mathbb{E}(\widetilde{G})-I\|$ implies $\mathcal{C}_{\delta}\cap\mathcal{D}_{\varepsilon}\subseteq\mathcal{B}_{\delta,\varepsilon}$ , and hence

[TABLE]

Using in sequence the definition of $\widetilde{G}$ , linearity of expectation, iv) and (2.11) we obtain

[TABLE]

for any $j,k=1,\ldots,n$ . On the event $\mathcal{A}_{\varepsilon,\Omega}$ for any $n\geq 1$ and $\varepsilon\in[0,1)$ we have

[TABLE]

As a consequence of the above bound

[TABLE]

From Lemma 2, under conditions i) and iv) it holds that

[TABLE]

Using (2.22) and (2.21), since $\Pr(\mathcal{C}_{\delta}^{C}\cup\mathcal{D}_{\varepsilon}^{C}|\mathcal{A}_{\varepsilon,\Omega})\leq\Pr(\mathcal{C}_{\delta}^{C}|\mathcal{A}_{\varepsilon,\Omega})+\Pr(\mathcal{D}_{\varepsilon}^{C}|\mathcal{A}_{\varepsilon,\Omega})\leq\alpha$ we obtain

[TABLE]

Finally using in sequence (2.18), (2.23) and v) gives

[TABLE]

∎

Lemma 1.

On the event $\mathcal{Z}_{\varepsilon}$ the following holds:

[TABLE]

Proof.

The expression on the right-hand side below is equivalent to (2.24):

[TABLE]

For the proof of (2.25) take $k=j$ in (2.24) and then sum $j$ from $1$ to $n$ . ∎

The following result from [20] is a consequence of Bernstein inequality for self-adjoint matrices.

Theorem 4.

Let $A\in\mathbb{R}^{n\times n}$ be a fixed matrix. Construct a symmetric random matrix $H\in\mathbb{R}^{n\times n}$ that satisfies

[TABLE]

Compute the per-sample second moment $m_{2}(H)=\|\mathbb{E}(H^{\top}H)\|$ . Form the matrix sampling estimator

[TABLE]

Then for all $\delta\geq 0$ the estimator satisfies

[TABLE]

In the next lemma we apply Theorem 4 on the event $\mathcal{A}_{\varepsilon,\Omega}=\mathcal{Z}_{\varepsilon}\cap\mathcal{W}_{\Omega}$ and with the fixed matrix $A=\mathbb{E}(\widetilde{G})$ , where the expectation is taken over $y^{1},\ldots,y^{m}$ for given $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ .

Lemma 2.

For any $\alpha\in(0,1)$ , $\varepsilon\in[0,1)$ and $n\geq 1$ , under conditions i) and iv) it holds that

[TABLE]

Proof.

We define the random matrix $H=H(y)$ whose components are

[TABLE]

and $y$ is distributed as $\widetilde{\sigma}_{n}$ . Using iv), define $H^{i}=H(y^{i})$ for $i=1,\ldots,m$ as $m$ copies of the random matrix $H$ . Notice that, from iv), on the event $\mathcal{A}_{\varepsilon,\Omega}$ the $H^{1},\ldots,H^{m}$ are mutually independent. They also satisfy

[TABLE]

From linearity of expectation, condition iv) and (2.19) we obtain $\mathbb{E}(H_{jk})=\mathbb{E}(\widetilde{G}_{jk})=\langle\widetilde{L}_{j},\widetilde{L}_{k}\rangle_{\widetilde{m}}$ . For any $n\geq 1$ and $\varepsilon\in[0,1)$ , from (2.20) on the event $\mathcal{A}_{\varepsilon,\Omega}$ we have $\|\mathbb{E}(H)-I\|=\|\mathbb{E}(\widetilde{G})-I\|\leq\varepsilon$ , and this is equivalent to

[TABLE]

Notice that, from the expression of $w$ in (2.10),

[TABLE]

and therefore $H^{\top}H=\gamma H$ . Define now $m_{2}(H):=\|\mathbb{E}(H^{\top}H)\|=\gamma\|\mathbb{E}(H)\|$ . Thanks to the previous bounds

[TABLE]

Since $H$ is a rank-one matrix,

[TABLE]

Finally, on the event $\mathcal{A}_{\varepsilon,\Omega}$ , we apply Theorem 4 with the fixed matrix $\mathbb{E}(\widetilde{G})$ . On the event $\mathcal{A}_{\varepsilon,\Omega}$ the parameter $\gamma$ satisfies the uniform bound (2.25), and we obtain

[TABLE]

If condition i) holds true, since

[TABLE]

we obtain the thesis. ∎

Proof of item II) in Theorem 3.

The proof of the error estimate proceeds in the same way as the analogous proof of [6, Theorem 2.1], with some differences due to the missing orthogonality of the $\widetilde{L}_{k}$ .

From Theorem 2 under ii) it holds $\Pr\left(\mathcal{N}_{\widetilde{\delta}}\right)>1-\alpha$ . Since $\Pr(\mathcal{N}_{\widetilde{\delta}}^{C}\cup\mathcal{A}_{\varepsilon,\Omega}^{C})\leq\Pr(\mathcal{N}_{\widetilde{\delta}}^{C})+\Pr(\mathcal{A}_{\varepsilon,\Omega}^{C})\leq\alpha+\beta$ we obtain

[TABLE]

Define $\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega}:=\mathcal{B}_{\delta,\varepsilon}\cap\mathcal{N}_{\widetilde{\delta}}\cap\mathcal{A}_{\varepsilon,\Omega}$ . Combining (2.26) and item I) it holds that $\Pr(\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega})>1-2\alpha-2\beta$ . On the event $\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega}^{C}$ it holds $\|u-u_{T}\|\leq\|u\|+\|u_{T}\|\leq 2\eta$ . Since $|u(y)-u_{T}(y)|\leq|u(y)-u_{W}(y)|$ for all $y\in\Omega$ , we also have $\|u-u_{T}\|\leq\|u-u_{W}\|$ . Denoting $g:=u-P_{n}u$ , on the event $\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega}$ it holds that

[TABLE]

where we have used that $g$ is orthogonal to $V_{n}$ , that $\textrm{span}(\widetilde{L}_{1},\ldots,\widetilde{L}_{n})=V_{n}$ , and that $P_{n}^{m}P_{n}u=P_{n}u$ . We expand $P_{n}^{m}g=\sum_{j=1}^{n}a_{j}\widetilde{L}_{j}$ over the $\widetilde{L}_{j}$ , with $a=(a_{j})_{j=1,\ldots,n}$ being the solution to $\widetilde{G}a=\widetilde{h}$ and $\widetilde{h}:=(\langle g,\widetilde{L}_{k}\rangle_{m})_{k=1,\ldots,n}$ .

Using in sequence the norm equivalence in the event $\mathcal{N}_{\widetilde{\delta}}$ , Lemma 1, $2a_{j}a_{k}\leq a_{j}^{2}+a_{k}^{2}$ , we obtain

[TABLE]

Thus replacing (2.28) in (2.27) provides the bound

[TABLE]

On the event $\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega}$ item I) gives $\|\widetilde{G}\|\geq 1-\delta-\varepsilon\implies\|\widetilde{G}^{-1}\|\leq(1-\delta-\varepsilon)^{-1}$ . Since $a=\widetilde{G}^{-1}\widetilde{h}$ we have

[TABLE]

Taking the total expectation over $y^{1},\ldots,y^{m},\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ and using $\Pr(\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega}^{C})\leq 2(\alpha+\beta)$ gives

[TABLE]

Denote with $\mathbb{E}_{\widetilde{y}}$ the expectation over $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ and with $\mathbb{E}_{y}$ the expectation over $y^{1},\ldots,y^{m}$ . For the second term above, using the independence of the random samples we have

[TABLE]

Summing term I over $k$ gives

[TABLE]

We now show that Term III is equal to zero. On the event $\mathcal{I}_{\delta,\widetilde{\delta},\varepsilon,\Omega}$ for any $k=1,\ldots,n$ and any $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ it holds that $\widetilde{L}_{k}\in V_{n}$ , and therefore

[TABLE]

where the function $\overline{L}_{k}=\overline{L}_{k}(\widetilde{y}^{i})=\mathbb{E}_{\widetilde{y}^{j}}(\widetilde{L}_{k}(\widetilde{y}^{i})g(\widetilde{y}^{j})\widetilde{L}_{k}(\widetilde{y}^{j}))$ is obtained as an average over $\widetilde{y}^{j}$ of functions in $V_{n}$ , i.e. $\widetilde{L}_{k}(\widetilde{y}^{i})$ , multiplied by real-valued random variables, i.e. $g(\widetilde{y}^{j})\widetilde{L}_{k}(\widetilde{y}^{j})$ . Therefore $\overline{L}_{k}$ does not depend on $\widetilde{y}^{j}$ and $\overline{L}_{k}\in V_{n}$ . Hence for any $k=1,\ldots,n$ the integral in the last line vanishes because $\overline{L}_{k}$ is orthogonal to $g$ .

For term IV, from Lemma 1 we obtain

[TABLE]

Summing term II over $k$ and using Lemma 1 gives

[TABLE]

Finally

[TABLE]

and combining with ii) and i) gives (2.17). ∎

2.3 Construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ with QR factorisation

In this section we use Householder QR factorisation (hereafter HQRf) for the construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ . Let $\varphi_{1},\ldots,\varphi_{n}\in V_{n}$ be $n$ linearly independent functions. Using the $\widetilde{m}$ random samples in (2.8), we introduce the matrix $W\in\mathbb{R}^{\widetilde{m}\times n}$ defined component-wise as $W_{jk}:=\varphi_{k}(\widetilde{y}^{j})$ for $j=1,\ldots,\widetilde{m}$ and $k=1,\ldots,n$ .

Recall the following result on HQRf, see e.g. [21, Theorem 4.24]: if $W$ has full rank, then it can be written uniquely in the form $W=QR$ , where the columns of $Q\in\mathbb{R}^{\widetilde{m}\times n}$ form an orthonormal basis of the column space of $W$ , and $R\in\mathbb{R}^{n\times n}$ is an upper triangular matrix with positive diagonal elements. Hence we can take

[TABLE]

and the factor $\sqrt{\widetilde{m}}$ makes the $\widetilde{L}_{k}$ orthonormal with (2.8), while the columns of $Q$ are orthonormal with the Euclidean scalar product in $\mathbb{R}^{\widetilde{m}}$ . For any $k=1,\ldots,n$ the analytic expression of $\widetilde{L}_{k}$ is given as a linear combination of $\varphi_{1},\ldots,\varphi_{k}$ by

[TABLE]

where for any $k=1,\ldots,n$ the vector $(\ell_{1}^{k},\ldots,\ell_{n}^{k})^{\top}\in\mathbb{R}^{n}$ is the solution to the linear system

[TABLE]

and $(e^{k})_{k=1,\ldots,n}$ is the standard basis of $\mathbb{R}^{n}$ , i.e. $e^{k}:=(e^{k}_{1},\ldots,e^{k}_{n})^{\top}\in\mathbb{R}^{n}$ and $e^{k}_{j}:=\delta_{jk}$ for any $j,k=1,\ldots,n$ .

The result above shows that if $\textrm{rank}(W)=n$ then the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ constructed by (2.30) satisfy P2. Conversely, if $\textrm{rank}(W)<n$ then the linear system (2.31) is singular, and P2 does not hold. Depending on the space $V_{n}$ and on the localisation of the supports of $\varphi_{1},\ldots,\varphi_{n}$ , two situations can occur:

•

$\varphi_{1},\ldots,\varphi_{n}$ are globally supported functions on $\Omega$ . When $V_{n}$ is a multivariate polynomial space $V_{n}=V_{\Lambda}:=\textrm{span}\{y^{\nu}\,:\,\nu\in\Lambda,y\in\Omega\}$ supported on a downward closed index set $\Lambda\subset\mathbb{N}^{d}_{0}$ with $n=\#(\Lambda)$ , one can choose $\varphi_{1},\ldots,\varphi_{n}$ as the tensorized monomial basis. In one dimension, whenever more than $n$ over $\widetilde{m}$ samples are distinct, the Vandermonde matrix $W$ has full rank. The same holds in higher dimension, but requiring that at least $n$ over $\widetilde{m}$ samples do not fall on any polynomial surface supported on $\Lambda$ . In both cases, the probability that $\textrm{rank}(W)<n$ is formally zero, and also completely negligible when considering the numerical rank of $W$ , since from ii) $\widetilde{m}$ is of the order $K_{n}\ln n\geq n\ln n$ .

•

$\varphi_{1},\ldots,\varphi_{n}$ are locally supported functions on $\Omega$ . In this case, the matrix $W$ is rank deficient whenever $\exists j\in[1,\ldots,n]\,:\,\textrm{supp}(\varphi_{j})\cap\{\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\}=\emptyset$ . The probability of such events is not zero, and can be calculated as a function of the size of $\textrm{supp}(\varphi_{j})$ . Moreover, it might not be small if some of the $\varphi_{j}$ have very localized support and $d$ is large.

We now show that $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ in (2.29) satisfy P1 with $\varepsilon$ not exceeding $\epsilon_{M}\widetilde{m}n^{3/2}$ , where $\epsilon_{M}\approx 10^{-16}$ is the machine precision. From (2.29) we obtain

[TABLE]

showing that $\varepsilon$ -orthonormality of the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ is related to the loss of orthogonality of the matrix $Q$ due to numerical cancellation. The right-hand side in (2.32) can be estimated using classical results on backward error analysis for HQRf, like [21, Theorem 1.5] or [11, Theorem 19.4]. Using such results (see e.g. [21, page 266]) upper bounds for the orthogonality error of $Q$ take the form

[TABLE]

where $\varphi=\varphi(n,\widetilde{m})$ is a slowly growing function of $n$ and $\widetilde{m}$ . In particular [11, Theorem 19.4] shows that $\varphi(n,\widetilde{m})\epsilon_{M}=cn\widetilde{m}\epsilon_{M}(1-cn\widetilde{m}\epsilon_{M})^{-1}$ with $c$ being a small numerical constant depending on the floating-point arithmetic. Hence $\|Q^{\top}Q-I\|_{F}\lesssim\epsilon_{M}\widetilde{m}n^{3/2}$ from (2.33), and thanks to (2.32) the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ constructed by (2.29)–(2.30) are provably $\varepsilon$ -orthonormal with $\varepsilon\approx\epsilon_{M}\widetilde{m}n^{3/2}$ .

We now discuss the robustness of the construction of $\widetilde{L}_{k}$ to ill-conditioning of $W$ . The matrix $W$ can be ill-conditioned, depending on the chosen basis $\varphi_{1},\ldots,\varphi_{n}$ for the given domain $\Omega$ . As a remarkable property of HQRf, the error bound (2.33) does not depend on $\kappa(W)$ , ensuring $\varepsilon$ -orthonormality of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ from (2.29) despite the ill-conditioning of $W$ . The matrix $R$ inherits the same ill-conditioning of $W$ , because $Q^{\top}Q\approx I$ and therefore $\kappa(W)\approx\kappa(R)$ . Nonetheless, the linear system with matrix $R^{\top}$ in (2.31) can be solved with high accuracy by forward substitution, see [12]. Hence both P1 and P2 can be ensured also when $W$ is ill-conditioned.

The following corollary of Theorem 3 is an immediate consequence of the above results on QR factorisation.

Corollary 1.

Given $\varphi_{1},\ldots,\varphi_{n}\in V_{n}$ linearly independent, and given $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ as in Theorem 3, let $W\in\mathbb{R}^{\widetilde{m}\times n}$ be the matrix with components $W_{ij}=\varphi_{j}(\widetilde{y}^{i})$ , and let $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ be constructed from $QR=W$ , the Householder QR factorisation of $W$ . Under the same assumptions of Theorem 3 but with item v) replaced by

v bis) $\Pr\left(\{\textrm{rank}(W)=n\}\cap\{\|Q^{\top}Q-I\|_{F}\leq\varepsilon\}\right)\geq 1-\beta$ ,

the conclusions of Theorem 3 in item I) and item II) hold true.

For given $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ and $\varphi_{1},\ldots,\varphi_{n}$ the event $\{\textrm{rank}(W)=n\}\cap\{\|Q^{\top}Q-I\|_{F}\leq\varepsilon\}$ in Corollary 1 can be checked if true or false, and thus its probability $1-\beta$ can be numerically estimated from the matrices $W$ and $Q$ . If $\varepsilon\approx\epsilon_{M}\widetilde{m}n^{3/2}$ then the inclusion $\{\textrm{rank}(W)=n\}\subseteq\mathcal{Z}_{\varepsilon}\cap\mathcal{W}_{\Omega}$ holds, and it is sufficient to check only the rank of $W$ . If $V_{n}$ is a multivariate polynomial space and $\varepsilon\approx\epsilon_{M}mn^{3/2}$ then $\beta=0$ .

Before closing the section, we discuss the choice of the functions $\varphi_{1},\ldots,\varphi_{n}$ , that plays an important role in the numerical stability of the algorithm. The components $\ell_{1}^{k},\ldots,\ell_{n}^{k}$ of the solution to (2.31) satisfy

[TABLE]

and might attain large values, e.g. due to possible bad scaling of the diagonal elements of $R$ . Large values of $\ell_{1}^{k},\ldots,\ell_{n}^{k}$ in (2.30) reflect a poor choice of $\varphi_{1},\ldots,\varphi_{n}$ to represent $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ on the given domain $\Omega$ . Indeed $R$ can always be made sufficiently close to the identity matrix if $\varphi_{1},\ldots,\varphi_{n}$ are chosen sufficiently close (in the $L^{2}(\Omega,\mu)$ sense) to $L_{1},\ldots,L_{n}$ . Unfortunately $L_{1},\ldots,L_{n}$ are unknown if $\Omega$ is irregular. In absence of a priori information on $L_{1},\ldots,L_{n}$ , we now show how to ensure that the $|\ell_{j}^{k}|$ in (2.30) are not too large, by adapting $\varphi_{1},\ldots,\varphi_{n}$ to the given domain $\Omega$ . To this aim, in Section 3.2 we propose an algorithm that first rescales each $\varphi_{j}$ as $\widetilde{\varphi}_{j}:=\rho_{j,\Omega}\varphi_{j}$ , where the factor $\rho_{j,\Omega}>0$ depends on the domain $\Omega$ , and then computes the HQR factorisation $\widetilde{Q}\widetilde{R}=\widetilde{W}$ of the matrix $\widetilde{W}\in\mathbb{R}^{\widetilde{m}\times n}$ with components $\widetilde{W}_{ij}=\widetilde{\varphi}_{j}(\widetilde{y}^{i})$ . The crucial point is that the algorithm choses $\rho_{j,\Omega}$ in such a way that $\widetilde{R}$ has all unitary diagonal elements. Using $\widetilde{R}$ the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ can be obtained as

[TABLE]

by solving the linear system

[TABLE]

with forward substitution for any $k=1,\ldots,n$ . Denote by $N\in\mathbb{R}^{n\times n}$ the upper triangular part of $\widetilde{R}$ , that is a nilpotent matrix of index $n$ . Thanks to the structure of $\widetilde{R}=I+N$ , using Neumann series we can write $\widetilde{R}^{-1}=I+\sum_{s=1}^{n-1}(-1)^{s}N^{s}$ . From (2.36) it holds $\widetilde{\ell}_{j}^{k}=(\widetilde{R}^{-1})_{jk}$ for all $j,k=1,\ldots,n$ . Therefore the coefficients $\widetilde{\ell}_{j}^{k}$ satisfy the safer bounds

[TABLE]

that do not depend on the scaling of the diagonal elements of $R$ . In practice the right hand-side of (2.37) exhibits only a slow growth w.r.t. $n$ thanks to the alternating sign in the summation and to $N$ being nilpotent. Therefore $N^{s}$ has at most $(n-s)^{2}/2$ nonzero components for any $s=1,\ldots,n$ . The algorithmic construction of the $\rho_{j,\Omega}$ is discussed in Section 3.2. It uses HQRf of suitable incremental updates of the matrix $W$ . Notice that each $\widetilde{\varphi}_{j}$ is obtained by rescaling $\varphi_{j}$ , and therefore $\textrm{rank}(W)=\textrm{rank}(\widetilde{W})$ .

In Section 3 we describe two numerical algorithms that compute the estimator $u_{T}$ , and their implementation. Both algorithms obey to the theoretical guarantees of Corollary 1. The difference between the two algorithms is in the computation of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ . The first algorithm computes $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ from (2.30) by solving (2.31), directly using any chosen $\varphi_{1},\ldots,\varphi_{n}$ . The second algorithm computes $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ from (2.35) by solving (2.36), adapting the chosen $\varphi_{1},\ldots,\varphi_{n}$ to the domain $\Omega$ . Both algorithms rely on the HQRf of $\widetilde{m}$ -by- $n$ matrices whose cost is proportional to $\widetilde{m}n^{2}$ . The second algorithm is numerically more stable thanks to (2.37), but also computationally more demanding.

3 Description of the algorithms

This section describes the numerical algorithms and their implementation. We start by describing the first algorithm. Given the domain $\Omega$ , the function $u$ , the space $V_{n}$ , the linearly independent functions $\varphi_{1},\ldots,\varphi_{n}\in V_{n}$ , the threshold $\varepsilon$ and the bound $\eta$ , the main tasks for the approximation of $u$ by the weighted least-squares estimator $u_{T}$ are the following, in the same sequential order:

Algorithm 1:

computes the estimator $u_{T}$ using the given $\varphi_{1},\ldots,\varphi_{n}$ .

Step 1:

generate $\widetilde{m}$ random samples $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\stackrel{{\scriptstyle\textrm{iid}}}{{\sim}}\mu$ ; 2. Step 2:

construct the matrix $W\in\mathbb{R}^{\widetilde{m}\times n}$ with components $W_{jk}:=\varphi_{k}(\widetilde{y}^{j})$ ; 3. Test 1:

IF $\textrm{rank}(W)<n$ THEN set $u_{T}\equiv 0$ and goto Step 9; ELSE continue; 4. Step 3:

rescale all the columns of $W$ such that $\|\varphi_{k}\|_{\widetilde{m}}=1$ (and keep track of the scaling factors); 5. Step 4:

compute $QR=W$ , the Householder QR factorisation of $W$ ; 6. Test 2:

IF $\|Q^{\top}Q-I\|_{F}>\varepsilon$ THEN set $u_{T}\equiv 0$ and goto Step 9; ELSE continue; 7. Step 5:

construct $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ from (2.30) by solving the linear system (2.31); 8. Step 6:

generate $m$ random samples $y^{1},\ldots,y^{m}\stackrel{{\scriptstyle\textrm{iid}}}{{\sim}}\widetilde{\sigma}_{n}$ ; 9. Step 7:

evaluate $u(y^{1}),\ldots,u(y^{m})$ ; 10. Step 8:

compute the estimator $u_{W}$ of $u$ by solving the normal equations and set $u_{T}=T_{\eta}\circ u_{W}$ ; 11. Step 9:

return $u_{T}$ .

The algorithms for the generation of the random samples at Steps 1 and 6 are presented in Section 3.1. The algorithm that computes the $\widetilde{L}_{k}$ at Steps 2, 3, 4 and 5 is discussed in Section 2.3. The construction of the normal equations at Step 8 is described in Section 3.3. The main purpose of Test 1 and Test 2 is to avoid wasting computational resources at the following steps, and in particular at Step 7. We now discuss the failure probabilities of each test. The failure probability of Test 1 depends on the localisation properties of the supports of $\varphi_{1},\ldots,\varphi_{n}$ , as discussed in Section 2.3. Whenever Test 1 fails, one can restart the algorithm from Step 1 with the same $\varphi_{1},\ldots,\varphi_{n}$ or with a different choice. Concerning Test 2, the analysis of the orthogonality error in Section 2.3 shows that, if $\textrm{rank}(W)=n$ and $\varepsilon\approx\epsilon_{M}\widetilde{m}n^{3/2}$ , then the failure probability of Test 2 is zero. This condition is only sufficient: for example in all the numerical tests in Section 4 the failure probability is zero with $\varepsilon=10^{-12}$ .

The second algorithm is the following Algorithm 2. It is similar to Algorithm 1, and the differences are in the computation of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ at Steps 3, 4 and 5. The algorithm ADAPT at Step 3 performs several orthonormalisation sweeps combined with suitable rescaling of the columns of $W$ , as described in Section 3.2. At Step 8, the construction of the normal equations again follows Section 3.3 but using the QR factorisation $\widetilde{Q}\widetilde{R}=\widetilde{W}$ of the matrix $\widetilde{W}$ .

Algorithm 2:

computes the estimator $u_{T}$ adapting the given $\varphi_{1},\ldots,\varphi_{n}$ to $\Omega$ .

Step 1:

generate $\widetilde{m}$ random samples $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\stackrel{{\scriptstyle\textrm{iid}}}{{\sim}}\mu$ ; 2. Step 2:

construct the matrix $W\in\mathbb{R}^{\widetilde{m}\times n}$ with components $W_{jk}:=\varphi_{k}(\widetilde{y}^{j})$ ; 3. Test 1:

IF $\textrm{rank}(W)<n$ THEN set $u_{T}\equiv 0$ and goto Step 9; ELSE continue; 4. Step 3:

compute the matrix $\widetilde{W}=\textrm{ADAPT}(W)$ ; 5. Step 4:

compute $\widetilde{Q}\widetilde{R}=\widetilde{W}$ , the Householder QR factorisation of $\widetilde{W}$ ; 6. Test 2:

IF $\|\widetilde{Q}^{\top}\widetilde{Q}-I\|_{F}>\varepsilon$ THEN set $u_{T}\equiv 0$ and goto Step 9; ELSE continue; 7. Step 5:

construct $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ from (2.35) by solving the linear system (2.36); 8. Step 6:

generate $m$ random samples $y^{1},\ldots,y^{m}\stackrel{{\scriptstyle\textrm{iid}}}{{\sim}}\widetilde{\sigma}_{n}$ ; 9. Step 7:

evaluate $u(y^{1}),\ldots,u(y^{m})$ ; 10. Step 8:

compute the estimator $u_{W}$ of $u$ by solving the normal equations and set $u_{T}=T_{\eta}\circ u_{W}$ ; 11. Step 9:

return $u_{T}$ .

3.1 Generation of the random samples

The following sampling algorithms can be used, see e.g. [8]. Independent random samples from $\mu$ on $\Omega\subseteq B=[-1,1]^{d}$ can be generated by rejection sampling. First step: draw iid random samples $\widetilde{y}^{1},\widetilde{y}^{2},\ldots$ from $\mu(B)$ , the uniform probability measure on $B$ . Second step: accept any random sample $\widetilde{y}^{i}$ drawn at the first step as a random sample from $\mu(\Omega)$ whenever $\widetilde{y}^{i}\in\Omega$ , and reject it otherwise. On average, the number of accepted random samples is proportional to $\lambda(\Omega)/\lambda(B)$ , where $\lambda(\cdot)$ denotes the Lebesgue measure. When $\lambda(\Omega)$ is small compared to $\lambda(B)=2^{d}$ , or when $d$ is large, the algorithm above suffers from the curse of dimensionality. For less general domains $\Omega$ , e.g. polytopes or convex bodies, alternative MCMC sampling algorithms like hit and run or random walk can be used.

Independent random samples $y^{1},\ldots,y^{m}$ from the discrete distribution $\widetilde{\sigma}_{n}$ can be generated, for example, by inverse transform sampling. In this case, the computational cost for drawing one sample from $\widetilde{\sigma}_{n}$ is $\mathcal{O}(\ln(\widetilde{m}))$ when using binary search, or $\mathcal{O}(1)$ when using the alias method, that however requires an additional cost for the preparation of the hash table.

3.2 Adapting $\varphi_{1},\ldots,\varphi_{n}$ to the domain $\Omega$

The algorithm ADAPT takes as input $W\in\mathbb{R}^{\widetilde{m}\times n}$ with components $W_{ij}=\varphi_{j}(\widetilde{y}^{i})$ and produces as output $\widetilde{W}\in\mathbb{R}^{\widetilde{m}\times n}$ with components $\widetilde{W}_{ij}=\widetilde{\varphi}_{j}(\widetilde{y}^{i})$ such that the matrix $\widetilde{R}$ in the Householder QR factorisation $\widetilde{Q}\widetilde{R}=\widetilde{W}$ of $\widetilde{W}$ has unitary diagonal elements. Each $\widetilde{\varphi}_{j}$ is constructed as $\widetilde{\varphi}_{j}=\rho_{j,\Omega}\,\varphi_{j}$ rescaling $\varphi_{j}$ by a factor $\rho_{j,\Omega}>0$ that depends on $\Omega$ . At the first iteration, with $j=1$ , $\widetilde{W}$ is initialized as the first column of $W$ renormalized. At iteration $j=2,\ldots,n$ , the algorithm creates an auxiliary matrix $Z\in\mathbb{R}^{\widetilde{m}\times j}$ by juxtaposition of $\widetilde{W}\in\mathbb{R}^{\widetilde{m}\times(j-1)}$ with the $j$ th renormalised column of $W$ . Then the QR factorisation of $Z$ is computed. Finally, the matrix $\widetilde{W}$ is updated again by juxtaposition of $\widetilde{W}$ with the $j$ th column of $W$ but this time rescaled by an appropriately chosen factor that produces $\widetilde{R}_{jj}=1$ in the matrix $\widetilde{R}$ such that $\widetilde{Q}\widetilde{R}=\widetilde{W}$ . Notice that the rescaling operation when multiplying $\varphi_{j}$ by $\rho_{j,\Omega}$ corresponds to a simple renormalisation of $\varphi_{j}$ in $\ell^{2}$ only when $j=1$ , due to the additional term $|\widetilde{R}_{jj}|^{-1}$ when $j\geq 2$ . For convenience, in the description of the algorithm we denote by $W(:,j)$ the $j$ th column of $W$ , and we denote by $[A|b]\in\mathbb{R}^{\widetilde{m}\times(k+1)}$ the juxtaposition of any matrix $A\in\mathbb{R}^{\widetilde{m}\times k}$ with any vector $b\in\mathbb{R}^{\widetilde{m}}$ .

3.3 Computation of the weighted least-squares estimator

The estimator $u_{W}$ can be calculated by solving the normal equations (2.13). The matrix $\widetilde{G}$ can be rewritten as $\widetilde{G}=D^{\top}D/m$ , where $D\in\mathbb{R}^{m\times n}$ is a matrix obtained by subsampling and reweighting the rows of the matrix $Q$ introduced in Section 2.3, as we now describe. After sampling the $y^{1},\ldots,y^{m}$ among the $\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ , we can build a deterministic function $\mathcal{S}:[1,\ldots,m]\to[1,\ldots,\widetilde{m}]$ such that $y^{i}=\widetilde{y}^{\mathcal{S}(i)}$ for any $i=1,\ldots,m$ . Using the function $\mathcal{S}$ and (2.29) we can build $D$ as

[TABLE]

The right-hand side $\widetilde{b}$ of (2.13) can be calculated component-wise as

[TABLE]

It is worth to mention that the random samples $y^{1},\ldots,y^{m}$ in Theorem 3 are drawn from $\widetilde{\sigma}_{n}$ with replacement. This preserves independence, which is needed in the proof of Lemma 2 when using Bernstein inequality. As an alternative, one can draw $y^{1},\ldots,y^{m}$ again from $\widetilde{\sigma}_{n}$ but without replacement. The corresponding function $\mathcal{S}$ is injective, and this avoids multiple occurrences of the same row in the matrix $D$ . However the generated $y^{1},\ldots,y^{m}$ are not independent anymore, and one cannot invoke Theorem 4. Nevertheless, such an approach is interesting because random samples generated without replacement can better concentrate around their mean than those generated with replacement.

4 Numerical examples with polynomial spaces

In this section the weighted least-squares estimator $u_{T}$ of $u$ on $V_{n}$ is computed by Algorithm 2, as described in Section 3. The functions $\varphi_{1},\ldots,\varphi_{n}$ are chosen as the tensorized monomial basis supported on the given polynomial space. When reporting the numerical results, we mainly focus on the stability of the estimator and on its approximation error. The stability is quantified by the condition number $\kappa(\widetilde{G})$ , and from item I) of Theorem 3, $\|\widetilde{G}-I\|\leq\delta+\varepsilon$ implies $\kappa(\widetilde{G})<(1+\widehat{\delta}+\varepsilon)/(1-\widehat{\delta}-\varepsilon)$ . In all the numerical tests in this section, the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ constructed by Householder QR factorisation are always $\varepsilon$ -orthonormal with values of $\varepsilon$ less than $10^{-12}$ .

We now describe the numerical estimation of the error $\mathbb{E}(\|u-u_{T}\|)$ in Theorem 3. Denote with $\Omega_{CV}\subset\Omega$ a set of $m_{CV}$ iid random samples uniformly distributed on $\Omega$ , chosen once and for all. For any draw of $y^{1},\ldots,y^{m},\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}\in\Omega$ the approximation error is estimated as

[TABLE]

The error in expectation is then estimated as a Monte Carlo average by

[TABLE]

with the average $\mathbb{E}_{MC}^{r}$ being over $r$ independent draws of the random samples $y^{1},\ldots,y^{m},\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ from their respective distributions. In the following numerical tests we choose $m_{CV}=10^{5}$ and $r=100$ .

As illustrative examples in dimension $d=2$ , we choose $\Omega$ as a Swiss cheese set, i.e. a compact set with holes, or the Mandelbrot set, or the annular set. With all the aforementioned domains $\Omega$ , upper bounds for $K_{n}(\Omega)$ are not known. For the choice of $\widetilde{m}$ , we define a parameter $\theta=\theta(n,\Omega)$ depending on $\Omega$ and $n$ , and then take $\widetilde{m}=\lceil\theta n\ln n\rceil$ . In all the numerical tests, choosing $m=\lceil 4n\ln n\rceil$ and any $\theta\geq 1$ largely suffices to maintain the condition number safely bounded as $\kappa(\widetilde{G})\leq 10$ . As discussed in Remark 1, the choice of $\theta$ is important for the accuracy of $u_{T}$ . Unless otherwise specified, we empirically choose $\theta=200$ .

4.1 Example with a smooth function on a domain with holes

Define $\Omega:=H\setminus\{E_{1}\cup E_{2}\}$ , where $H:=\textrm{Conv}(S)$ is the convex hull of the point set

[TABLE]

$E_{1}$ is a standard ellipse centered in $(-0.2,-0.3)^{\top}$ with semiaxes of length $0.15$ and $0.15/\sqrt{2}$ , and $E_{2}$ is a standard ellipse centered in $(0.2,0.2)^{\top}$ with semiaxes of length $0.2$ and $0.2/\sqrt{2}$ . The geometry of $\Omega$ is shown in Figure 2. We consider the function

[TABLE]

The space $V_{n}$ is chosen as the polynomial space supported on the index set $\Lambda=\Lambda_{TD}^{d,k}:=\{\nu\in\mathbb{N}_{0}^{d}\,:\,\|\nu\|_{\ell^{1}}\leq k\}$ , a.k.a. the total degree polynomial space of order $k$ , whose dimension equals $n=\dim(V_{n})=\#(\Lambda_{TD}^{d,k})=\binom{d+k}{k}$ . Figure 1 shows the error $\mathbb{E}_{MC}^{r}(\|u-u_{T}\|_{CV})$ and condition number $\kappa(\widetilde{G})$ when $m=\lceil n\ln n\rceil$ or $m=\lceil 4n\ln n\rceil$ , and $\widetilde{m}=\lceil 200n\ln n\rceil$ . The error decreases exponentially w.r.t. $k$ , and $\widetilde{G}$ remains well-conditioned even when choosing $m=\lceil n\ln n\rceil$ . Figure 2 shows one shot of the random samples $y^{1},\ldots,y^{m},\widetilde{y}^{1},\ldots,\widetilde{y}^{\widetilde{m}}$ (left figure) and two realizations of the pointwise error $y\mapsto|u(y)-u_{T}(y)|$ for $y\in\Omega$ (center and right figures).

4.2 Comparison with examples from the literature

The following two examples are taken from [1]. Consider the function

[TABLE]

when $\Omega$ is the Mandelbrot set displayed in Figure 4-right, or the function

[TABLE]

when $\Omega=\{y\in B\,:\,\frac{1}{4}\leq\|y\|_{\ell^{2}}\leq 1\}$ is the annular set displayed in Figure 5-right. With both functions, the space $V_{n}$ is chosen as the polynomial space supported on the hyperbolic cross index set of order $k$ defined as $\Lambda=\Lambda_{HC}^{d,k}:=\{\nu\in\mathbb{N}_{0}^{d}\,:\,\prod_{j=1}^{d}(\nu_{j}+1)\leq k+1\}$ .

Figure 3 shows the error and condition number for the example with the function (4.40) on the Mandelbrot set. When choosing $m=\lceil 4n\ln n\rceil$ and $\widetilde{m}=\lceil 200n\ln n\rceil$ , the error in Figure 3 decreases exponentially w.r.t $k$ up to $k=19$ , and then exhibits an increasing variability and suboptimal convergence rate for $k>19$ . This is due to an underestimation of $K_{n}(\Omega)$ when choosing $\theta=200$ for the given domain. Taking a larger $\theta=2000$ restores the exponential convergence of the error, at least for $k$ up to $57$ . In Figure 4-left we report the same results as Figure 3-left but with $n$ in abscissa. Figure 4-right shows one realization of the pointwise error $y\mapsto\log_{10}|u(y)-u_{T}(y)|$ on $\Omega$ , obtained from the simulation in Figure 4-left when $n=176$ , and the maximum error over $\Omega$ is of the order $10^{-8}$ .

Figure 5-left shows the error for the example with the nonsmooth function (4.41) on the annular set, with $m=\lceil 4n\ln n\rceil$ and $\widetilde{m}=\lceil 200n\ln n\rceil$ . The corresponding results for the condition number are the same as the blue data in Figure 3-right, since both examples use the same polynomial space. The error in Figure 5-left decreases algebraically w.r.t. $n$ . One realization of the error is shown in Figure 5-right: the maximum error over $\Omega$ equals $0.45$ and is attained along the discontinuities of $u$ on the Cartesian axes. The error in Figure 5-left does not manifest any instability, in contrast to the error in [1, Figure 5] obtained for the same testcase but with the different method there proposed. In general, the error of the estimator $u_{T}$ is not affected by the distance of $\Omega$ from the boundary of $B$ , even when $\Omega$ touches $\partial B$ , like in this example.

5 Conclusions and perspectives

We have developed and analysed numerical algorithms for the construction of weighted least-squares estimators in any $n$ -dimensional space $V_{n}\subset L^{2}(\Omega,\mu)$ defined on a general bounded domain $\Omega$ , when an explicit $L^{2}(\Omega,\mu)$ -orthonormal basis is not available. The estimator is stable with high probability, quasi-optimally converging in expectation, and uses a number of function evaluations $m$ of the order $n\ln n$ . The calculation of the estimator requires the numerical construction of a discretely orthonormal surrogate basis $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ of $V_{n}$ , at a computational cost that depends on the Christoffel function of $\Omega$ and $V_{n}$ .

The results in Theorem 3 apply to any general orthonormalisation algorithm that can construct an $\varepsilon$ -orthonormal surrogate basis $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ for $V_{n}$ with some probability $1-\beta$ . When using the Householder QR factorisation and $V_{n}$ is a multivariate polynomial space, $\varepsilon$ is provably tiny for $n$ up to thousands, and $\beta=0$ .

An important point in the numerical construction of the surrogate basis is the robustness to ill-conditioning arising from the lack of knowledge of an $L^{2}(\Omega,\mu)$ -orthonormal basis. The algorithms proposed in this paper are extremely robust to such an ill-conditioning, and compute weighted least-squares estimators that are numerically stable and accurate with all the functions and domains tested.

As a final remark, the whole analysis in this paper immediately applies to the adaptive setting, using nested sequences of approximation spaces $V^{1}\subset\cdots\subset V^{k}\subset L^{2}(\Omega,\mu)$ rather than a single a priori given approximation space.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Adcock, D. Huybrechs: Approximating smooth, multivariate functions on irregular domains , ar Xiv:1802.00602
2[2] B. Adcock, J. M. Cardenas: Near-optimal sampling strategies for multivariate function approximation on general domains , ar Xiv:1908.01249
3[3] L. Bos: Asymptotics for the Christoffel function for Jacobi like weights on a ball in ℝ m superscript ℝ 𝑚 \mathbb{R}^{m} , New Zealand J. Math., 23:99–109, 1994.
4[4] A. Chkifa, A. Cohen, G. Migliorati, F. Nobile, R. Tempone: Discrete least squares polynomial approximation with random evaluations - application to parametric and stochastic elliptic PD Es , ESAIM Math. Model. Numer. Anal., 49(3):815–837, 2015.
5[5] A. Cohen, M. A. Davenport, D. Leviatan: On the stability and accuracy of least squares approximations , Found. Comput. Math., 13:819–834, 2013.
6[6] A. Cohen, G. Migliorati: Optimal weighted least-squares methods , SMAI Journal of Computational Mathematics, 3:181–203, 2017.
7[7] P. J. Davis: Interpolation and approximation , Dover, 1975.
8[8] L. Devroye: Non-Uniform Random Variate Generation , Springer, 1986.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Multivariate approximation of functions on irregular domains

Abstract

1 Introduction and overview of the paper

2 Weighted least-squares approximation on irregular domains

Theorem 1**.**

Theorem 2**.**

2.1 Main results

Theorem 3**.**

Remark 1** (Comparison with Theorem 1).**

Remark 2** (Missing orthogonality of the L~1,…,L~n\widetilde{L}_{1},\ldots,\widetilde{L}_{n}L1​,…,Ln​).**

2.2 Proofs and intermediate results

Proof of item I) in Theorem 3.

Lemma 1**.**

Proof.

Theorem 4**.**

Lemma 2**.**

Proof.

Proof of item II) in Theorem 3.

2.3 Construction of L~1,…,L~n\widetilde{L}_{1},\ldots,\widetilde{L}_{n}L1​,…,Ln​ with QR factorisation

Corollary 1**.**

3 Description of the algorithms

Algorithm 1:

Algorithm 2:

3.1 Generation of the random samples

3.2 Adapting φ1,…,φn\varphi_{1},\ldots,\varphi_{n}φ1​,…,φn​ to the domain Ω\OmegaΩ

3.3 Computation of the weighted least-squares estimator

4 Numerical examples with polynomial spaces

4.1 Example with a smooth function on a domain with holes

4.2 Comparison with examples from the literature

5 Conclusions and perspectives

Theorem 1.

Theorem 2.

Theorem 3.

Remark 1 (Comparison with Theorem 1).

Remark 2 (Missing orthogonality of the $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ ).

Lemma 1.

Theorem 4.

Lemma 2.

2.3 Construction of $\widetilde{L}_{1},\ldots,\widetilde{L}_{n}$ with QR factorisation

Corollary 1.

3.2 Adapting $\varphi_{1},\ldots,\varphi_{n}$ to the domain $\Omega$