Worst-case optimal approximation with increasingly flat Gaussian kernels

Toni Karvonen; Simo S\"arkk\"a

arXiv:1906.02096·math.NA·January 10, 2020

Worst-case optimal approximation with increasingly flat Gaussian kernels

Toni Karvonen, Simo S\"arkk\"a

PDF

TL;DR

This paper investigates the optimal approximation of positive linear functionals in Gaussian kernel-induced spaces, revealing convergence to polynomial and Gaussian quadrature methods as the kernel becomes flatter.

Contribution

It introduces a new perspective on approximation with flat Gaussian kernels and generalizes the interpolation problem, including optimal point selection and convergence analysis.

Findings

01

Convergence to polynomial methods with fixed points.

02

Extension to optimal point selection leading to Gaussian quadrature.

03

Explicit characterization of the RKHS via damped polynomials.

Abstract

We study worst-case optimal approximation of positive linear functionals in reproducing kernel Hilbert spaces induced by increasingly flat Gaussian kernels. This provides a new perspective and some generalisations to the problem of interpolation with increasingly flat radial basis functions. When the evaluation points are fixed and unisolvent, we show that the worst-case optimal method converges to a polynomial method. In an additional one-dimensional extension, we allow also the points to be selected optimally and show that in this case convergence is to the unique Gaussian quadrature type method that achieves the maximal polynomial degree of exactness. The proofs are based on an explicit characterisation of the reproducing kernel Hilbert space of the Gaussian kernel in terms of exponentially damped polynomials.

Equations119

K_{\ell}(x,x^{\prime})=\Phi\bigg{(}\frac{\mathinner{\lVert x-x^{\prime}\rVert}_{2}}{\ell}\bigg{)}

K_{\ell}(x,x^{\prime})=\Phi\bigg{(}\frac{\mathinner{\lVert x-x^{\prime}\rVert}_{2}}{\ell}\bigg{)}

s_{ℓ, f, X} (x) = n = 1 \sum N f (x_{n}) u_{ℓ, n} (x),

s_{ℓ, f, X} (x) = n = 1 \sum N f (x_{n}) u_{ℓ, n} (x),

K_{ℓ} (x_{1}, x_{1}) ⋮ K_{ℓ} (x_{N}, x_{1}) \dots ⋱ \dots K_{ℓ} (x_{1}, x_{N}) ⋮ K_{ℓ} (x_{N}, x_{N}) u_{ℓ, 1} (x) ⋮ u_{ℓ, N} (x) = K_{ℓ} (x, x_{1}) ⋮ K_{ℓ} (x, x_{N})

K_{ℓ} (x_{1}, x_{1}) ⋮ K_{ℓ} (x_{N}, x_{1}) \dots ⋱ \dots K_{ℓ} (x_{1}, x_{N}) ⋮ K_{ℓ} (x_{N}, x_{N}) u_{ℓ, 1} (x) ⋮ u_{ℓ, N} (x) = K_{ℓ} (x, x_{1}) ⋮ K_{ℓ} (x, x_{N})

K_{\ell}(x,x^{\prime})=\exp\bigg{(}\!-\frac{\mathinner{\lVert x-x^{\prime}\rVert}_{2}^{2}}{2\ell^{2}}\bigg{)}

K_{\ell}(x,x^{\prime})=\exp\bigg{(}\!-\frac{\mathinner{\lVert x-x^{\prime}\rVert}_{2}^{2}}{2\ell^{2}}\bigg{)}

L_{x} [f] = f (x) and L_{μ} [f] = \int_{Ω} f d μ for a Borel measure μ on Ω,

L_{x} [f] = f (x) and L_{μ} [f] = \int_{Ω} f d μ for a Borel measure μ on Ω,

Q_{X} (w) [f] = n = 1 \sum N w (n) f (x_{n}) \approx L [f] .

Q_{X} (w) [f] = n = 1 \sum N w (n) f (x_{n}) \approx L [f] .

e_{\ell}\big{(}Q_{X}(w)\big{)}=\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\biggl{\lvert}L[f]-\sum_{n=1}^{N}w(n)f(x_{n})\biggr{\rvert}}.

e_{\ell}\big{(}Q_{X}(w)\big{)}=\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\biggl{\lvert}L[f]-\sum_{n=1}^{N}w(n)f(x_{n})\biggr{\rvert}}.

w_{\ell}^{*}=\operatorname*{arg\,min}_{w\in\mathbb{R}^{N}}e_{\ell}\big{(}Q_{X}(w)\big{)}\quad\text{ and }\quad e_{\ell}(Q_{X}(w_{\ell}^{*})\big{)}=\inf_{w\in\mathbb{R}^{N}}e_{\ell}\big{(}Q_{X}(w)\big{)}.

w_{\ell}^{*}=\operatorname*{arg\,min}_{w\in\mathbb{R}^{N}}e_{\ell}\big{(}Q_{X}(w)\big{)}\quad\text{ and }\quad e_{\ell}(Q_{X}(w_{\ell}^{*})\big{)}=\inf_{w\in\mathbb{R}^{N}}e_{\ell}\big{(}Q_{X}(w)\big{)}.

K_{ℓ} (x_{1}, x_{1}) ⋮ K_{ℓ} (x_{N}, x_{1}) \dots ⋱ \dots K_{ℓ} (x_{1}, x_{N}) ⋮ K_{ℓ} (x_{N}, x_{N}) w_{ℓ}^{*} (1) ⋮ w_{ℓ}^{*} (N) = L [K_{ℓ} (\cdot, x_{1})] ⋮ L [K_{ℓ} (\cdot, x_{N})] .

K_{ℓ} (x_{1}, x_{1}) ⋮ K_{ℓ} (x_{N}, x_{1}) \dots ⋱ \dots K_{ℓ} (x_{1}, x_{N}) ⋮ K_{ℓ} (x_{N}, x_{N}) w_{ℓ}^{*} (1) ⋮ w_{ℓ}^{*} (N) = L [K_{ℓ} (\cdot, x_{1})] ⋮ L [K_{ℓ} (\cdot, x_{N})] .

Q_{X} (w_{ℓ}^{*}) = L [s_{ℓ, f, X}] = n = 1 \sum N f (x_{n}) L [u_{ℓ, n}] .

Q_{X} (w_{ℓ}^{*}) = L [s_{ℓ, f, X}] = n = 1 \sum N f (x_{n}) L [u_{ℓ, n}] .

f (x) = e^{- ∥ x ∥_{2}^{2} / (2 ℓ^{2})} α \in N_{0}^{d} \sum f_{α} x^{α} such that ∥ f ∥_{H (K_{ℓ})}^{2} = α \in N_{0}^{d} \sum ℓ^{2 ∣ α ∣} α! f_{α}^{2} < \infty,

f (x) = e^{- ∥ x ∥_{2}^{2} / (2 ℓ^{2})} α \in N_{0}^{d} \sum f_{α} x^{α} such that ∥ f ∥_{H (K_{ℓ})}^{2} = α \in N_{0}^{d} \sum ℓ^{2 ∣ α ∣} α! f_{α}^{2} < \infty,

\bigg{\{}\frac{1}{\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha!}}\operatorname{e}^{-\mathinner{\lVert x\rVert}_{2}^{2}/(2\ell^{2})}x^{\alpha}\bigg{\}}_{\alpha\in\mathbb{N}_{0}^{d}}

\bigg{\{}\frac{1}{\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha!}}\operatorname{e}^{-\mathinner{\lVert x\rVert}_{2}^{2}/(2\ell^{2})}x^{\alpha}\bigg{\}}_{\alpha\in\mathbb{N}_{0}^{d}}

Π_{m} = span {x^{α} : α \in N_{0}^{d}, ∣ α ∣ \leq m} .

Π_{m} = span {x^{α} : α \in N_{0}^{d}, ∣ α ∣ \leq m} .

N = # X = dim Π_{m} = (d m + d) = \frac{( m + d )!}{d ! m !}

N = # X = dim Π_{m} = (d m + d) = \frac{( m + d )!}{d ! m !}

P_{Π} = x_{1}^{α_{1}} ⋮ x_{N}^{α_{1}} \dots ⋱ \dots x_{1}^{α_{N}} ⋮ x_{N}^{α_{N}},

P_{Π} = x_{1}^{α_{1}} ⋮ x_{N}^{α_{1}} \dots ⋱ \dots x_{1}^{α_{N}} ⋮ x_{N}^{α_{N}},

ϕ_{α}^{ℓ} (x) = e^{- ∥ x ∥_{2}^{2} / (2 ℓ^{2})} x^{α},

ϕ_{α}^{ℓ} (x) = e^{- ∥ x ∥_{2}^{2} / (2 ℓ^{2})} x^{α},

P_{ϕ, ℓ} = ϕ_{α_{1}}^{ℓ} (x_{1}) ⋮ ϕ_{α_{1}}^{ℓ} (x_{N}) \dots ⋱ \dots ϕ_{α_{N}}^{ℓ} (x_{1}) ⋮ ϕ_{α_{N}}^{ℓ} (x_{N})

P_{ϕ, ℓ} = ϕ_{α_{1}}^{ℓ} (x_{1}) ⋮ ϕ_{α_{1}}^{ℓ} (x_{N}) \dots ⋱ \dots ϕ_{α_{N}}^{ℓ} (x_{1}) ⋮ ϕ_{α_{N}}^{ℓ} (x_{N})

\mathinner{\!\bigl{\lvert}L[\phi_{\alpha}^{\ell}]-Q_{X}(w_{\ell})[\phi_{\alpha}^{\ell}]\bigr{\rvert}}\to 0\quad\text{ for every }\quad\mathinner{\lvert\alpha\rvert}\leq m,

\mathinner{\!\bigl{\lvert}L[\phi_{\alpha}^{\ell}]-Q_{X}(w_{\ell})[\phi_{\alpha}^{\ell}]\bigr{\rvert}}\to 0\quad\text{ for every }\quad\mathinner{\lvert\alpha\rvert}\leq m,

∥ L_{Π} - P_{Π}^{T} w_{ℓ} ∥_{2} \leq ∥ L_{Π} - L_{ϕ, ℓ} ∥_{2} + ∥ L_{ϕ, ℓ} - P_{ϕ, ℓ}^{T} w_{ℓ} ∥_{2} + ∥ P_{ϕ, ℓ}^{T} w_{ℓ} - P_{Π}^{T} w_{ℓ} ∥_{2},

∥ L_{Π} - P_{Π}^{T} w_{ℓ} ∥_{2} \leq ∥ L_{Π} - L_{ϕ, ℓ} ∥_{2} + ∥ L_{ϕ, ℓ} - P_{ϕ, ℓ}^{T} w_{ℓ} ∥_{2} + ∥ P_{ϕ, ℓ}^{T} w_{ℓ} - P_{Π}^{T} w_{ℓ} ∥_{2},

L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}\mathinner{\lvert x^{\alpha}\rvert}\Bigg{]}\leq C_{L}<\infty

L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}\mathinner{\lvert x^{\alpha}\rvert}\Bigg{]}\leq C_{L}<\infty

\lim_{\ell\to\infty}w_{\ell}^{*}=w_{\text{\tiny{pol}}}\quad\text{ and }\quad e_{\ell}\big{(}Q_{X}(w_{\ell}^{*})\big{)}=\mathcal{O}\big{(}\ell^{-(m+1)}\big{)},

\lim_{\ell\to\infty}w_{\ell}^{*}=w_{\text{\tiny{pol}}}\quad\text{ and }\quad e_{\ell}\big{(}Q_{X}(w_{\ell}^{*})\big{)}=\mathcal{O}\big{(}\ell^{-(m+1)}\big{)},

g_{α} (x) = \frac{1}{ℓ ^{∣ α ∣} α !} e^{- ∥ x ∥_{2}^{2} / (2 ℓ^{2})} x^{α} = \frac{1}{ℓ ^{∣ α ∣} α !} ϕ_{α}^{ℓ} (x) .

g_{α} (x) = \frac{1}{ℓ ^{∣ α ∣} α !} e^{- ∥ x ∥_{2}^{2} / (2 ℓ^{2})} x^{α} = \frac{1}{ℓ ^{∣ α ∣} α !} ϕ_{α}^{ℓ} (x) .

\frac{1}{\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha!}}\mathinner{\!\bigl{\lvert}L[\phi_{\alpha}^{\ell}]-Q_{X}(w_{\ell}^{*})[\phi_{\alpha}^{\ell}]\bigr{\rvert}}=\mathinner{\!\bigl{\lvert}L[g_{\alpha}]-Q_{X}(w_{\ell}^{*})[g_{\alpha}]\bigr{\rvert}}\leq e_{\ell}\big{(}Q_{X}(w_{\ell}^{*})\big{)}.

\frac{1}{\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha!}}\mathinner{\!\bigl{\lvert}L[\phi_{\alpha}^{\ell}]-Q_{X}(w_{\ell}^{*})[\phi_{\alpha}^{\ell}]\bigr{\rvert}}=\mathinner{\!\bigl{\lvert}L[g_{\alpha}]-Q_{X}(w_{\ell}^{*})[g_{\alpha}]\bigr{\rvert}}\leq e_{\ell}\big{(}Q_{X}(w_{\ell}^{*})\big{)}.

\begin{split}e_{\ell}\big{(}&Q_{X}(w_{\phi,\ell})\big{)}\\ &=\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\Biggl{\lvert}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}-Q_{X}(w_{\phi,\ell})\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}\Biggr{\rvert}}\\ &\leq\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\mathinner{\lvert f_{\alpha}\rvert}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}+\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\Biggl{\lvert}Q_{X}(w_{\phi,\ell})\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}\Biggr{\rvert}},\end{split}

\begin{split}e_{\ell}\big{(}&Q_{X}(w_{\phi,\ell})\big{)}\\ &=\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\Biggl{\lvert}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}-Q_{X}(w_{\phi,\ell})\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}\Biggr{\rvert}}\\ &\leq\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\mathinner{\lvert f_{\alpha}\rvert}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}+\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\Biggl{\lvert}Q_{X}(w_{\phi,\ell})\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}\Biggr{\rvert}},\end{split}

\begin{split}\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\mathinner{\lvert f_{\alpha}\rvert}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}&\leq L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha!}}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}\\ &\leq\ell^{-(m+1)}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}\\ &\leq\ell^{-(m+1)}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}\mathinner{\lvert x^{\alpha}\rvert}\Bigg{]}\\ &\leq C_{L}\ell^{-(m+1)}\end{split}

\begin{split}\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\mathinner{\lvert f_{\alpha}\rvert}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}&\leq L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha!}}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}\\ &\leq\ell^{-(m+1)}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}\mathinner{\lvert\phi_{\alpha}^{\ell}\rvert}\Bigg{]}\\ &\leq\ell^{-(m+1)}L\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{\mathinner{\lvert a_{\alpha}\rvert}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}\mathinner{\lvert x^{\alpha}\rvert}\Bigg{]}\\ &\leq C_{L}\ell^{-(m+1)}\end{split}

n = 1, \dots, N max ∣ ϕ_{α}^{ℓ} (x_{n})∣ \leq n = 1, \dots, N max ∣ x_{n}^{α} ∣ \leq C_{X}

n = 1, \dots, N max ∣ ϕ_{α}^{ℓ} (x_{n})∣ \leq n = 1, \dots, N max ∣ x_{n}^{α} ∣ \leq C_{X}

\displaystyle\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\Biggl{\lvert}Q_{X}(w_{\phi,\ell})\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}\Biggr{\rvert}}

\displaystyle\sup_{\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1}\,\mathinner{\!\Biggl{\lvert}Q_{X}(w_{\phi,\ell})\Bigg{[}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}f_{\alpha}\phi_{\alpha}^{\ell}\Bigg{]}\Biggr{\rvert}}

\leq ∥ f ∥_{H (K_{ℓ})} \leq 1 sup n = 1 \sum N ∣ w_{ϕ, ℓ} (n)∣ ∣ α ∣ \geq m + 1 \sum ∣ f_{α} ∣ ∣ ϕ_{α}^{ℓ} (x_{n})∣

\leq ℓ^{- (m + 1)} n = 1 \sum N ∣ w_{ϕ, ℓ} (n)∣ ∣ α ∣ \geq m + 1 \sum \frac{∣ a _{α} ∣}{ℓ ^{∣ α ∣ - (m + 1)} α !} ∣ ϕ_{α}^{ℓ} (x_{n})∣

\leq ℓ^{- (m + 1)} n = 1 \sum N ∣ w_{ϕ, ℓ} (n)∣ ∣ α ∣ \geq m + 1 \sum \frac{C _{X}}{ℓ _{0}^{∣ α ∣ - (m + 1)} α !}

\displaystyle\leq\ell^{-(m+1)}\bigg{(}\sup_{\ell\geq\ell_{0}}\sum_{n=1}^{N}\mathinner{\lvert w_{\phi,\ell}(n)\rvert}\bigg{)}\sum_{\mathinner{\lvert\alpha\rvert}\geq m+1}\frac{C_{X}}{\ell_{0}^{\mathinner{\lvert\alpha\rvert}-(m+1)}\sqrt{\alpha!}}

= : C_{Q} ℓ^{- (m + 1)}

e_{\ell}\big{(}Q_{X}(w_{\phi,\ell})\big{)}\leq(C_{L}+C_{Q})\ell^{-(m+1)}\eqqcolon C\ell^{-(m+1)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

∎

11institutetext: Department of Electrical Engineering and Automation

Aalto University, Espoo, Finland

11email: [email protected], [email protected]

Worst-case optimal approximation with increasingly flat Gaussian kernels

Toni Karvonen

Simo Särkkä

Abstract

We study worst-case optimal approximation of positive linear functionals in reproducing kernel Hilbert spaces induced by increasingly flat Gaussian kernels. This provides a new perspective and some generalisations to the problem of interpolation with increasingly flat radial basis functions. When the evaluation points are fixed and unisolvent, we show that the worst-case optimal method converges to a polynomial method. In an additional one-dimensional extension, we allow also the points to be selected optimally and show that in this case convergence is to the unique Gaussian quadrature type method that achieves the maximal polynomial degree of exactness. The proofs are based on an explicit characterisation of the reproducing kernel Hilbert space of the Gaussian kernel in terms of exponentially damped polynomials.

Keywords:

Worst-case analysis Reproducing kernel Hilbert spaces Gaussian kernel Gaussian quadrature

1 Introduction

Most popular kernels used scattered data approximation [Fasshauer and McCourt, 2015, Wendland, 2005] and Gaussian process regression [Rasmussen and Williams, 2006] are isotropic (i.e., radial basis functions), depending only on the Euclidean distance $\mathinner{\lVert\cdot\rVert}_{2}$ between the points:

[TABLE]

for a continuous positive-definite function $\Phi\colon[0,\infty)\to\mathbb{R}$ and a length-scale parameter $\ell>0$ . Given any function $f\colon\mathbb{R}^{d}\to\mathbb{R}$ evaluated at distinct points ${X=\{x_{1},\ldots,x_{N}\}\subset\mathbb{R}^{d}}$ such a kernel can be used to construct a unique kernel interpolant based on the translates $\{K_{\ell}(\cdot,x_{n})\}_{n=1}^{N}$ . The kernel interpolant is

[TABLE]

where $u_{n}$ are the Lagrange cardinal functions that solve the linear system

[TABLE]

and satisfy $u_{\ell,n}(x_{m})=\delta_{nm}$ . Uniqueness of the solution for each $x\in\mathbb{R}^{d}$ is guaranteed by positive-definiteness of the matrix on the left-hand side of this system.

When $\ell\to\infty$ , the kernel $K_{\ell}$ becomes increasingly flat and the linear system (3) increasingly ill-conditioned.111Note that most of the literature we cite parametrises the kernel in terms of the inverse length-scale $\varepsilon=1/\ell$ and accordingly considers the case $\varepsilon\to 0$ . Nevertheless, the corresponding kernel interpolant is typically well-behaved at this limit. Starting with the work of Driscoll and Fornberg [2002], it has been shown that a certain unisolvency assumption on $X$ implies that the kernel interpolant converges to (i) a polynomial interpolant if the kernel is infinitely smooth [Driscoll and Fornberg, 2002, Fornberg et al., 2004, Larsson and Fornberg, 2005, Lee et al., 2007, Schaback, 2005, 2008] or (ii) a polyharmonic spline interpolant if the kernel is finitely smooth [Lee et al., 2014, Song et al., 2012]. Further generalisations appear in [Lee et al., 2015]. The former case covers kernels such as Gaussians, multiquadrics, and inverse multiquadrics while the latter applies to, for example, Matérn kernels and Wendland’s functions. Among the most interesting of these results is the one by Schaback [2005] who proved that the interpolant at the increasingly flat limit of the Gaussian kernel

[TABLE]

exists regardless of the geometry of $X$ and coincides with the de Boor and Ron polynomial interpolant [de Boor, 1994, de Boor and Ron, 1992]. Furthermore, numerical ill-conditioning for large $\ell$ , mentioned above, has necessitated the development of techniques for stable evaluation of the kernel interpolant [Cavoretto et al., 2015, Fasshauer and McCourt, 2012, Fornberg et al., 2013, Wright and Fornberg, 2017]. Increasingly flat kernels have been also discussed independently in the literature on the use of Gaussian processes for numerical integration Minka [2000], O’Hagan [1991], Särkkä et al. [2016], albeit accompanied only with non-rigorous arguments. Even though the intuition that the lowest degree terms in the Taylor expansion of the kernel dominate construction of the interpolant as $\ell\to\infty$ and that this ought to imply convergence to a polynomial interpolant is quite clear, this is not always translated into transparent proofs.

The purpose of this article is to generalise the aforementioned results on flat limits of kernel interpolants for worst-case optimal approximation of general positive linear functionals in the reproducing kernel Hilbert space (RKHS) of the Gaussian kernel (4). That such generalisations are possible is not perhaps surprising; it is rather the simple proof technique made possible by the worst-case framework and an explicit characterisation [Minh, 2010] of the Gaussian RKHS that we find the most interesting aspect of the present work.

1.1 Worst-case optimal approximation

Let $\Omega$ be a subset of $\mathbb{R}^{d}$ with a non-empty interior and $L\colon C(\Omega)\to\mathbb{R}$ a positive linear functional acting on continuous real-valued functions defined on $\Omega$ and satisfying ${L[\mathinner{\lvert p\rvert}]<\infty}$ for every polynomial $p$ on $\Omega$ . The functionals most often discussed in this article are the point evaluation and the integration functionals

[TABLE]

respectively. Derivative evaluation functionals $L_{x}^{(n)}[f]=f^{(n)}(x)$ are also often considered. A cubature rule (quadrature if $d=1$ ) $Q_{X}(w)\colon C(\Omega)\to\mathbb{R}$ with the distinct points $X=\{x_{1},\ldots,x_{N}\}\subset\Omega$ and weights $w=(w(1),\ldots,w(N))\in\mathbb{R}^{N}$ is a weighted approximation to $L$ of the form

[TABLE]

When restricted on $\Omega\times\Omega$ , the positive-definite kernel $K_{\ell}$ in (1) induces a unique reproducing kernel Hilbert space $\mathcal{H}(K_{\ell})\subset C(\Omega)$ where the reproducing property $\langle f,K_{\ell}(\cdot,x)\rangle_{\mathcal{H}(K_{\ell})}=f(x)$ holds for every $x\in\Omega$ and $f\in\mathcal{H}(K_{\ell})$ . With minor modifications everything in this section holds also when the kernel is not isotropic. Because the kernel is isotropic, $L[K_{\ell}(x,x)]\leq L[\Phi(0)]<\infty$ by the assumption that $L[p]$ is finite if $p$ is a polynomial. This guarantees that $L[K_{\ell}(\cdot,x)]\in\mathcal{H}(K_{\ell})$ for any $x\in\Omega$ and consequently that $L[f]<\infty$ for any $f\in\mathcal{H}(K_{\ell})$ .

The worst-case error $e_{\ell}(Q_{X}(w))$ of the cubature rule (6) in $\mathcal{H}(K_{\ell})$ is

[TABLE]

Given a fixed set of distinct points, we are interested in the kernel cubature rule $Q_{X}(w_{\ell}^{*})$ whose weights are chosen so as to minimise the worst-case error:

[TABLE]

These weights are unique and available as the solution to the linear system [Oettershagen, 2017, Section 3.2]

[TABLE]

Although our notation does not make this explicit, the weights obviously depend on the linear functional $L$ and the evaluation points $X$ . For each $x\in\mathbb{R}^{d}$ , the kernel interpolant $s_{\ell,f,X}(x)$ now arises as the kernel cubature rule for approximation of the point evaluation functional $L_{x}$ in (5) and the Lagrange functions are $u_{\ell,n}(x)=w_{\ell}^{*}(n)$ . In this case the worst-case error coincides with the power function [Schaback, 1993]. For an arbitrary $L$ , the kernel cubature rule can be obtained by applying $L$ to the kernel interpolant:

[TABLE]

That is, the weights are $w_{\ell}^{*}(n)=L[u_{\ell,n}]$ .

1.2 Contributions

Recall that we only consider the Gaussian kernel (4). This article contains two theoretical main contributions:

•

In Section 2 we prove that if $X$ is unisolvent with respect to a full polynomial space $\Pi_{m}$ and $N=\dim\Pi_{m}$ , then $Q_{X}(w_{\ell}^{*})$ converges (as $\ell\to\infty$ ) to the unique cubature rule $Q_{X}(w_{\text{\tiny{pol}}})$ that satisfies ${Q_{X}(w_{\text{\tiny{pol}}})[p]=L[p]}$ for every polynomial $p$ of degree at most $m$ . This result, contained in Theorem 2.4 and Corollary 2.5, is a generalisation for arbitrary positive linear functionals of the interpolation results cited earlier. If $\Omega$ is bounded, the results hold for any positive linear functional satisfying the mild assumptions imposed earlier. However, boundedness of $\Omega$ is not necessary: at the end of Section 2 we supply an example involving integration over $\mathbb{R}^{d}$ with respect to the Gaussian measure.

•

In Section 3 we present a generalisation, based on a theorem of Barrow [1978], for optimal kernel quadrature rules [Oettershagen, 2017, Chapter 5] that have both their points and weights selected so as to minimise the worst-case error. The result, Theorem 3.4, states that such rules, if unique, converge to the $N$ -point Gaussian quadrature rule for the functional $L$ , which is the unique quadrature rule $Q_{X_{\text{\tiny{G}}}}(w_{\text{\tiny{G}}})$ such that $Q_{X_{\text{\tiny{G}}}}(w_{\text{\tiny{G}}})[p]=L[p]$ for every polynomial $p$ of degree at most $2N-1$ . This partially settles a conjecture posed by O’Hagan [1991, Section 3.3], and further discussed in [Minka, 2000, Särkkä et al., 2016], on convergence of optimal kernel quadrature rules to Gaussian quadrature rules.

Some generalisations for other kernels and cubature rules of more general form than (6) are briefly discussed in Section 4.

2 Fixed points

The following theorem, which provides a characterisation of the RKHS of the Gaussian kernel (4), is the central tool of this article. This results is due to Steinwart et al. [2006] and Minh [2010]; see also [Steinwart and Christmann, 2008, Section 4.4] and [De Marchi and Schaback, 2009, Example 3]. In this theorem (and the remainder of the article) $\mathbb{N}_{0}^{d}$ stands for the collection of $d$ -dimensional non-negative multi-indices: $\mathbb{N}_{0}^{d}=\{(\alpha_{1},\ldots,\alpha_{d})\in\mathbb{R}^{d}\,\mathrel{\mathop{\ordinarycolon}}\,\alpha_{1},\ldots,\alpha_{d}\in\mathbb{N}_{0}\}$ . The absolute value and factorial of $\alpha\in\mathbb{N}_{0}^{d}$ are $\mathinner{\lvert\alpha\rvert}=\alpha_{1}+\cdots+\alpha_{d}$ and $\alpha!=\alpha_{1}!\times\cdots\times\alpha_{d}!$ .

Theorem 2.1 (Steinwart 2006; Minh 2010)

Let $\Omega$ be a subset of $\mathbb{R}^{d}$ with a non-empty interior. Then the RKHS $\mathcal{H}(K_{\ell})$ induced by the Gaussian kernel (4) with length-scale $\ell>0$ consists of the functions

[TABLE]

where convergence is absolute. Its inner product is $\langle f,g\rangle_{\mathcal{H}(K_{\ell})}=\sum_{\alpha\in\mathbb{N}_{0}^{d}}\ell^{2\mathinner{\lvert\alpha\rvert}}\alpha!f_{\alpha}g_{\alpha}$ . Furthermore, the collection

[TABLE]

of functions forms an orthonormal basis of $\mathcal{H}(K_{\ell})$ .

Two crucial implications of this theorem are that $\mathcal{H}(K_{\ell})$ consists of functions expressible as series of exponentially damped polynomials, the damping effect vanishing as $\ell\to\infty$ , and that, due to the terms $\ell^{2\mathinner{\lvert\alpha\rvert}}$ appearing in the RKHS norm, the high-degree terms contribute the most to the norm. Consequently, the worst-case error (7), taking into account only functions of at most unit norm, is dominated by low-degree terms when $\ell$ is large. The rest of this section formalises this intuition.

Let $\Pi_{m}\subset C(\Omega)$ stand for the space of $d$ -variate polynomials of degree at most $m\in\mathbb{N}_{0}$ :

[TABLE]

In this section we assume that the point set $X\subset\Omega\subset\mathbb{R}^{d}$ is $\Pi_{m}$ -unisolvent. That is,

[TABLE]

and the zero function is the only element of $\Pi_{m}$ that vanishes on $X$ . This is equivalent to non-singularity of the (generalised) Vandermonde matrix

[TABLE]

where $\{\alpha_{1}$ , …, $\alpha_{N}\}=\{\alpha\in\mathbb{N}_{0}^{d}\,\mathrel{\mathop{\ordinarycolon}}\,\mathinner{\lvert\alpha\rvert}\leq m\}\subset\mathbb{N}_{0}^{d}$ . It follows that there is a unique polynomial cubature rule $Q_{X}(w_{\text{\tiny{pol}}})$ such that $Q_{X}(w_{\text{\tiny{pol}}})[p]=L[p]<\infty$ for every $p\in\Pi_{m}$ . Its weights solve the linear system $P_{\Pi}^{\mathsf{T}}w_{\text{\tiny{pol}}}=L_{\Pi}$ of $N$ equations, where the $N$ -vector $L_{\Pi}$ has the elements $[L_{\Pi}]_{n}=L[x^{\alpha_{n}}]$ . In this section we prove that the worst-case optimal weights $w_{\ell}^{*}$ for the Gaussian kernel (4) converge to $w_{\text{\tiny{pol}}}$ as $\ell\to\infty$ .

Define then

[TABLE]

so that functions in the Gaussian RKHS, characterised by Theorem 2.1, are of the form $f(x)=\sum_{\alpha\in\mathbb{N}_{0}^{d}}f_{\alpha}\phi_{\alpha}^{\ell}(x)$ for coefficients $f_{\alpha}$ decaying sufficiently fast. Since the exponential function has no real roots, determinant of the matrix

[TABLE]

satisfies $\mathinner{\lvert P_{\phi,\ell}\rvert}=\mathinner{\lvert P_{\Pi}\rvert}\exp(-\sum_{n=1}^{N}\mathinner{\lVert x_{n}\rVert}_{2}^{2}/(2\ell^{2}))\neq 0$ and $P_{\phi,\ell}$ is hence non-singular. From non-singularity it follows that there are unique weights $w_{\phi,\ell}$ such that $Q_{X}(w_{\phi,\ell})[\phi_{\alpha}^{\ell}]=L[\phi_{\alpha}^{\ell}]$ for every $\alpha\in\mathbb{N}_{0}^{d}$ satisfying $\mathinner{\lvert\alpha\rvert}\leq m$ . The weights solve ${P_{\phi,\ell}^{\mathsf{T}}w_{\phi,\ell}=L_{\phi,\ell}}$ , where the $N$ -vector $L_{\Phi,\ell}$ has the elements $[L_{\phi,\ell}]_{n}=L[\phi_{\alpha_{n}}^{\ell}]$ .222See [Fasshauer and McCourt, 2012] for an interpolation method based on a closely related basis derived from a Mercer eigendecomposition of the Gaussian kernel and [Karvonen and Särkkä, 2019] for an explicit construction of weights similar to $w_{\phi,\ell}$ in the case $L$ is the Gaussian integral. This auxiliary cubature rule plays an important role in our argument. To summarise, the following three weights (or sequences of weights) appear in the proofs below:

The weights $w_{\ell}^{*}$ , solved from (8), are the worst-case optimal weights for the Gaussian kernel (4). The results concern the behaviour of these weights as $\ell\to\infty$ . 2. 2.

The weights $w_{\text{\tiny{pol}}}$ are constructed such that the cubature rule defined by them is exact for all polynomials up to degree $m$ : $Q_{X}(w_{\text{\tiny{pol}}})[p]=L[p]$ whenever $p\in\Pi_{m}$ . 3. 3.

The auxiliary weights $w_{\phi,\ell}$ satisfy $Q_{X}(w_{\phi,\ell})[\phi^{\ell}_{\alpha}]=L[\phi_{\alpha}^{\ell}]$ for every $\ell>0$ and $\mathinner{\lvert\alpha\rvert}\leq m$ .

Lemma 2.2

Suppose that $X$ is $\Pi_{m}$ -unisolvent and $\lim_{\ell\to\infty}L[\phi_{\alpha}^{\ell}(x)]=L[x^{\alpha}]$ for every $\mathinner{\lvert\alpha\rvert}\leq m$ . Then there is a constant $C_{\ell_{0}}\geq 0$ such that ${\sup_{\ell\geq\ell_{0}}\,\sum_{n=1}^{N}\mathinner{\lvert w_{\phi,\ell}(n)\rvert}\leq C_{\ell_{0}}}$ for any $\ell_{0}>0$ .

Proof

The assumption $\lim_{\ell\to\infty}L[\phi_{\alpha}^{\ell}(x)]=L[x^{\alpha}]$ and unisolvency of $X$ imply that $\lim_{\ell\to\infty}w_{\phi,\ell}=w_{\text{\tiny{pol}}}$ . Because $L[\mathinner{\lvert p\rvert}]<\infty$ for any polynomial $p$ , both the weights $w_{\text{\tiny{pol}}}$ and $w_{\phi,\ell}$ are finite, which implies the claim. ∎

Lemma 2.3

Suppose that $X$ is $\Pi_{m}$ -unisolvent and $\lim_{\ell\to\infty}L[\phi_{\alpha}^{\ell}(x)]=L[x^{\alpha}]$ for every $\mathinner{\lvert\alpha\rvert}\leq m$ . If $(w_{\ell})_{\ell>0}$ is any sequence of weights such that

[TABLE]

then $\lim_{\ell\to\infty}w_{\ell}=w_{\text{\tiny{pol}}}$ .

Proof

We have $P_{\Pi}^{\mathsf{T}}w_{\Pi}=L_{\Pi}$ and

[TABLE]

where each of the terms on the right-hand side vanishes as $\ell\to\infty$ . Because $\mathinner{\lVert L_{\Pi}-P_{\Pi}^{\mathsf{T}}w_{\ell}\rVert}_{2}=\mathinner{\lVert P_{\Pi}^{\mathsf{T}}(w_{\text{\tiny{pol}}}-w_{\ell})\rVert}_{2}$ and $P_{\Pi}$ is non-singular, we conclude that $\lim_{\ell\to\infty}w_{\ell}=w_{\text{\tiny{pol}}}$ . ∎

We are ready to prove the main result of the article for a fixed $\Pi_{m}$ -unisolvent point set $X\subset\Omega$ consisting of $N$ distinct points. First, by considering one of the basis functions (10) we show that $\mathinner{\lvert L[\phi_{\alpha}^{\ell}]-Q_{X}(w_{\ell}^{*})[\phi_{\alpha}^{\ell}]\rvert}\leq\sqrt{\alpha!}\ell^{\mathinner{\lvert\alpha\rvert}}e_{\ell}(Q_{X}(w_{\ell}^{*}))$ for every $\alpha\in\mathbb{N}_{0}^{d}$ . Second, the sub-optimal cubature rule $Q_{X}(w_{\phi,\ell})$ defined above can be used, in combination with (9), to establish the upper bound ${e_{\ell}(Q_{X}(w_{\ell}^{*}))\leq C\ell^{-(m+1)}}$ . These two bounds imply that $\mathinner{\lvert L[\phi_{\alpha}^{\ell}]-Q_{X}(w_{\ell}^{*})[\phi_{\alpha}^{\ell}]\rvert}\to 0$ for every $\mathinner{\lvert\alpha\rvert}\leq m$ . If $\lim_{\ell\to\infty}L[\phi_{\alpha}^{\ell}(x)]=L[x^{\alpha}]$ , Lemma 2.3 then implies that $w_{\ell}^{*}\to w_{\text{\tiny{pol}}}$ .

Theorem 2.4

Let $N=\dim\Pi_{m}$ for some $m\in\mathbb{N}_{0}$ and $X$ be $\Pi_{m}$ -unisolvent. Suppose that $\lim_{\ell\to\infty}L[\phi_{\alpha}^{\ell}(x)]=L[x^{\alpha}]$ for every $\alpha\in\mathbb{N}_{0}^{d}$ such that $\mathinner{\lvert\alpha\rvert}\leq m$ and that

[TABLE]

for some $\ell_{0}>1$ and any sequence $(a_{\alpha})_{\alpha\in\mathbb{N}_{0}^{d}}$ such that $\sum_{\alpha\in\mathbb{N}_{0}^{d}}a_{\alpha}^{2}\leq 1$ . Then

[TABLE]

where $w_{\text{\tiny{pol}}}$ are the weights of the unique polynomial cubature rule such that $Q_{X}(w_{\text{\tiny{pol}}})[p]=L[p]$ for every $p\in\Pi_{m}$ .

Proof

For every $\alpha\in\mathbb{N}_{0}^{d}$ select the function

[TABLE]

From Theorem 2.1 it follows that $\mathinner{\lVert g_{\alpha}\rVert}_{\mathcal{H}(K_{\ell})}^{2}=1$ since $g_{\alpha}$ is one of the basis functions (10). Thus, by definition of the worst-case error,

[TABLE]

Next we derive an appropriate upper bound on $e_{\ell}(Q_{X}(w_{\ell}^{*}))$ by considering the unique sub-optimal cubature rule $Q_{X}(w_{\phi,\ell})$ that is exact for every $\phi_{\alpha}^{\ell}$ with $\mathinner{\lvert\alpha\rvert}\leq m$ . In the expansion (9) of a function in $\mathcal{H}(K_{\ell})$ we have $L[\phi_{\alpha}^{\ell}]=Q_{X}(w_{\phi,\ell})[\phi_{\alpha}^{\ell}]$ for every term with $\mathinner{\lvert\alpha\rvert}\leq m$ . Consequently, the worst-case error admits the bound

[TABLE]

where $f_{\alpha}$ are the coefficients that define $f\in\mathcal{H}(K_{\ell})$ in Theorem 2.1. A consequence of (9) is that $\mathinner{\lVert f\rVert}_{\mathcal{H}(K_{\ell})}\leq 1$ implies $\mathinner{\lvert f_{\alpha}\rvert}\leq a_{\alpha}/(\ell^{\mathinner{\lvert\alpha\rvert}}\sqrt{\alpha})$ for some real numbers $\mathinner{\lvert a_{\alpha}\rvert}\leq 1$ such that $\sum_{\alpha\in\mathbb{N}_{0}^{d}}a_{\alpha}^{2}\leq 1$ . Therefore, for $\ell\geq\ell_{0}>1$ ,

[TABLE]

by assumption (14). Moreover, because

[TABLE]

for some $C_{X}>0$ and every $\ell$ , we have

[TABLE]

where $C_{Q}<\infty$ follows from convergence of the last term and Lemma 2.2. Thus

[TABLE]

when $\ell\geq\ell_{0}$ . Since $Q_{X}(w_{\ell}^{*})$ is worst-case optimal, we have thus established with (15) and (16) that, for sufficiently large $\ell$ ,

[TABLE]

for every $\alpha\in\mathbb{N}_{0}^{d}$ such that $\mathinner{\lvert\alpha\rvert}\leq m$ and a constant $C$ independent of $\ell$ . That is,

[TABLE]

The claim then follows by setting $w_{\ell}=w_{\ell}^{*}$ in Lemma 2.3. ∎

Assumptions of Theorem 2.4 hold, for instance, if the domain $\Omega$ is bounded.

Corollary 2.5

Let $N=\dim\Pi_{m}$ for some $m\in\mathbb{N}_{0}$ and $X$ be $\Pi_{m}$ -unisolvent. Suppose that $\Omega$ is bounded. Then

[TABLE]

where $w_{\text{\tiny{pol}}}$ are the weights of the unique polynomial cubature rule such that $Q_{X}(w_{\text{\tiny{pol}}})[p]=L[p]$ for every $p\in\Pi_{m}$ .

Proof

On a bounded domain the convergence $\phi_{\alpha}^{\ell}(x)\to x^{\alpha}$ as $\ell\to\infty$ is uniform. Thus

[TABLE]

as $\ell\to\infty$ for every $\alpha\in\mathbb{N}_{0}^{d}$ . Assumption (14) is also satisfied:

[TABLE]

where $\beta=(b,\ldots,b)\in\mathbb{R}^{d}$ for $b=\sup_{z\in\Omega}\mathinner{\lVert z\rVert}_{2}$ and finiteness follows from the assumption $L[1]<\infty$ . ∎

However, boundedness of $\Omega$ is not necessary. Consider Gaussian integration:

[TABLE]

If $\alpha\in\mathbb{N}_{0}^{d}$ has an odd element, $L[\phi_{\alpha}^{\ell}]=L[x^{\alpha}]=0$ for every $\ell>0$ by symmetry. If $\alpha=2\beta$ for some $\beta\in\mathbb{N}_{0}^{d}$ the convergence $L[\phi_{\alpha}^{\ell}(x)]\to L[x^{\alpha}]$ as $\ell\to\infty$ follows from the monotone convergence theorem. To verify (14), recall that the absolute moments of the standard Gaussian distribution are

[TABLE]

where $\Gamma(\cdot)$ is the Gamma function. Because $(n-1)!!\leq\sqrt{n!}$ for any $n\in\mathbb{N}$ and

[TABLE]

if $n$ is odd, we have

[TABLE]

Thus

[TABLE]

if $\ell_{0}>1$ .

3 Optimal points in one dimension

Let $d=1$ and $\Omega=[a,b]$ for $a<b$ . In this section we consider quadrature rules whose points are also selected so as to minimise the worst-case error. A kernel quadrature rule is optimal if its points and weights satisfy

[TABLE]

In order to eliminate degrees of freedom in ordering the points we require that the points are in ascending order (i.e., $x_{n}\leq x_{n+1}$ ). Even though optimal kernel quadrature rules have been studied since the 1970s [Barrar et al., 1974, Bojanov, 1979, Larkin, 1970, Richter, 1970, Richter-Dyn, 1971] for the integration functional $L[f]=\int_{a}^{b}f(x)\omega(x)\operatorname{d\!}x$ , $\omega(x)>0$ , their theory is still not complete (the main results have been recently collated by Oettershagen [2017, Section 5.1]). Although uniqueness results are been proved only for totally positive isotropic kernels of the form (1) and integration when $\omega\equiv 1$ [Braess and Dyn, 1982], there exists numerical evidence suggesting that the optimal rule is unique in more general settings [Oettershagen, 2017, p. 97]. Note that the Gaussian kernel (4) we consider is totally positive.

In Theorem 3.4 we show that uniqueness of an optimal kernel quadrature rule for each $\ell>0$ implies that its increasingly flat limit is $Q_{\text{\tiny{G}}}=Q_{X_{\text{\tiny{G}}}}(w_{\text{\tiny{G}}})$ , the $N$ -point Gaussian quadrature rule for the linear functional $L$ . This is the unique quadrature rule that is exact for every polynomial of degree at most $2N-1$ : $Q_{\text{\tiny{G}}}[x^{n}]=L[x^{n}]$ whenever $n\leq 2N-1$ . This degree of exactness is maximal; there are no $N$ -point quadrature rules exact for all polynomials up to degree $2N$ . The most familiar methods of this type are of course the classical Gaussian quadrature rules for numerical integration [Gautschi, 2004, Section 1.4]. For example, the Gauss–Legendre quadrature rule satisfies

[TABLE]

for every polynomial $p$ of degree at most $2N-1$ and its points are the roots of the $N$ th degree Legendre polynomial. Theorem 3.4 was conjectured by O’Hagan [1991, Section 3.3] in 1991 in the form that the optimal kernel quadrature rule has the classical Gauss–Hermite quadrature rule as its increasingly flat limit if the kernel is Gaussian and $L$ is the Gaussian integral. More discussion of this conjecture—but no rigorous proofs—can be found in [Minka, 2000, Section 4].

The proof of Theorem 3.4 is based on a general result by Barrow [1978] on existence and uniqueness of generalised Gaussian quadrature rules. This result replaces the polynomials in a Gaussian quadrature rule with generalised polynomials formed out of functions that constitute an extended Chebyshev system [Karlin and Studden, 1966, Chapter 1]. A collection $\{u_{n}\}_{n=0}^{m-1}\subset C^{m-1}([a,b])$ of functions is an extended Chebyshev system if any non-trivial linear combination of the functions has at most $m-1$ zeroes, counting multiplicities. That is, if $u\in\mathrm{span}(\{u_{n}\}_{n=0}^{m-1})$ and $u^{(q_{p})}(x_{p})=0$ for $x_{p}\in[a,b]$ , $p=1,\ldots,P$ , and $q_{p}=0,\ldots,Q_{p}-1$ , then $\sum_{p=1}^{P}Q_{p}\leq m-1$ . Any basis of the space of polynomials of degree at most $m-1$ is an extended Chebyshev system. Importantly, the functions $\{\phi_{n}^{\ell}\}_{n=0}^{m-1}$ in (12) are an extended Chebyshev system for any $m\in\mathbb{N}$ . To verify this, note that any $\phi\in\mathrm{span}(\{\phi_{n}^{\ell}\}_{n=0}^{m-1})$ can be written as $\phi(x)=\operatorname{e}^{-x^{2}/(2\ell^{2})}p(x)$ for some polynomial $p$ of degree at most $m-1$ and consequently

[TABLE]

for some polynomials $s_{r}$ . From this expression we see that $\phi^{(l)}(x)=0$ for every $l=0,\ldots,q$ if and only if $p^{(l)}(x)=0$ for every $l=0,\ldots,q$ . Since $p$ can have at most $m-1$ zeroes, counting multiplicities, it follows that the same is true of $\phi$ .

Theorem 3.1 (Barrow 1978)

Let $\{u_{n}\}_{n=0}^{2N-1}\subset C^{2N-1}([a,b])$ be an extended Chebyshev system and $L$ a positive linear functional on $\mathrm{span}(\{u_{n}\}_{n=0}^{2N-1})$ . Then there exist unique points ${a<x_{1}<\cdots<x_{N}<b}$ and positive weights $w\in\mathbb{R}_{+}^{N}$ such that

[TABLE]

Lemma 3.2

Let $\Omega\subset\mathbb{R}^{d}$ and suppose that a cubature rule $Q_{X}(w)$ with non-negative weights satisfies $Q_{X}(w)[u]=L[u]$ for some positive function ${u\colon\Omega\to(0,\infty)}$ such that $0<c_{l}\leq u(x)\leq c_{u}$ for all $x\in\Omega$ . Then

[TABLE]

Proof

The claim follows immediately from the inequalities

[TABLE]

∎

Lemma 3.3

Let $A$ be a metric space, $\ell_{0}>0$ a constant, and ${g\colon[\ell_{0},\infty)\times A\to[0,\infty)}$ a function. If there is a continuous function ${g_{\infty}\colon A\to[0,\infty)}$ such that ${g(\ell,\cdot)\to g_{\infty}}$ uniformly as $\ell\to\infty$ and a unique minimiser $x_{\infty}^{*}$ for which $g_{\infty}(x_{\infty}^{*})=0$ , then a function $z\colon[\ell_{0},\infty)\to A$ such that $\lim_{\ell\to\infty}g(\ell,z(\ell))=0$ has $\lim_{\ell\to\infty}z(\ell)=x_{\infty}^{*}$ .

Proof

The inequality $g_{\infty}(z(\ell))\leq g(\ell,z(\ell))+\mathinner{\lvert g_{\infty}(z(\ell))-g(\ell,z(\ell))\rvert}$ shows that $g_{\infty}(z(\ell))\to 0$ since $g(\ell,z(\ell))\to 0$ by assumption and $\mathinner{\lvert g_{\infty}(z(\ell))-g(\ell,z(\ell))\rvert}\to 0$ by uniformity of the convergence $g(\ell,\cdot)\to g_{\infty}$ . Because $g_{\infty}$ is continuous, non-negative, and has a unique minimiser $x_{\infty}^{*}$ , this implies that $z(\ell)\to x_{\infty}^{*}$ . ∎

Theorem 3.4

Suppose that $\Omega=[a,b]$ for $a<b$ . If for every $\ell>0$ there exists a unique optimal kernel quadrature rule $Q_{\ell}^{*}=Q_{X_{\ell}^{*}}(w_{\ell}^{*})$ , then its points and weights converge to those of the $N$ -point Gaussian quadrature rule for $L$ :

[TABLE]

where $X_{\text{\tiny{G}}}$ and $w_{\text{\tiny{G}}}$ are the unique points and weights such that $Q_{X_{\text{\tiny{G}}}}(w_{\text{\tiny{G}}})[x^{n}]=L[x^{n}]$ for every $0\leq n\leq 2N-1$ . Moreover, $e_{\ell}(Q_{\ell}^{*})=\mathcal{O}(\ell^{-2N})$ .

Proof

In a manner identical to the proof of Theorem 2.4, we establish the lower bound

[TABLE]

that holds for every $n\geq 0$ . Because $\{\phi_{n}^{\ell}\}_{n=0}^{2N-1}$ are an extended Chebyshev system, Theorem 3.1 guarantees the existence of a unique $N$ -point quadrature rule ${Q_{\text{\tiny{G}}}^{\ell}=Q_{X_{\text{\tiny{G}}}^{\ell}}(w_{\text{\tiny{G}}}^{\ell})}$ such that $Q_{\text{\tiny{G}}}^{\ell}[\phi_{n}^{\ell}]=L[\phi_{n}^{\ell}]$ for every $n\leq 2N-1$ . The points $X_{\text{\tiny{G}}}^{\ell}=\{x_{1}^{\text{\tiny{G}},\ell},\ldots x_{N}^{\text{\tiny{G}},\ell}\}$ of this rule are distinct and lie inside $\Omega$ and the weights $w_{\text{\tiny{G}}}^{\ell}$ are positive. We can then replicate the rest of the proof of Theorem 2.4 in one dimension but with $m=2N-1$ and Lemma 2.2 replaced with Lemma 3.2 (applied to the function $u=\phi_{0}^{\ell}$ ) to show that, for sufficiently large $\ell$ and a constant $C$ independent of $\ell$ ,

[TABLE]

for every $n\leq 2N-1$ . Consequently,

[TABLE]

for every $n\leq 2N-1$ . We then fix $\ell_{0}>0$ and invoke Lemma 3.3 with the function

[TABLE]

domain $A=(\Omega^{N}\times[0,\infty)^{N})$ , and $z(\ell)=(X_{\ell}^{*},w_{\ell}^{*})$ . Because the domain $\Omega=[a,b]$ is bounded, $\lim_{\ell\to\infty}L[\phi_{n}^{\ell}]\to L[x^{n}]$ for every $n\in\mathbb{N}_{0}$ . Thus

[TABLE]

uniformly on $A$ . Since the unique minimiser of $g_{\infty}$ is $(X_{\text{\tiny{G}}},w_{\text{\tiny{G}}})$ , the claim follows from (18) and Lemma 3.3. ∎

4 Generalisations

This section discusses some straightforward generalisations of the results in Sections 2 and 3.

4.1 Damped power series kernels

Theorem 2.1 for the Gaussian kernel (4) is a consequence of the identity

[TABLE]

where $K^{\text{\tiny{pow}}}_{\ell}(x,x^{\prime})$ is a power series kernel [Zwicknagl, 2009]. Accordingly, the results in Sections 2 and 3 can be generalised for a class of kernels that we call damped power series kernels. Let $G\colon\mathbb{R}^{d}\to\mathbb{R}\setminus\{0\}$ be a non-zero function and define $G_{\ell}(x)=G(\mathinner{\lVert x\rVert}/\ell)$ . Then a damped power series kernel is

[TABLE]

for $q>0$ and weight parameters $\omega_{\alpha}>0$ such that the series converges for any $\ell>0$ and $x,x^{\prime}\in\Omega$ . Arguments identical to those used in [Minh, 2010, Zwicknagl, 2009] establish that $K_{\ell}$ is a positive-definite kernel and that its RKHS $\mathcal{H}(K_{\ell})$ consists of functions

[TABLE]

The Gaussian kernel is recovered by setting $G(x)=\operatorname{e}^{-\mathinner{\lVert x\rVert}_{2}^{2}/2}$ , $q=2$ , and $\omega_{\alpha}=\alpha!$ . Note that the Gaussian kernel is an exception; damped power series kernels are rarely stationary.

Denote $\psi_{\alpha}^{\ell}(x)=G_{\ell}(x)x^{\alpha}$ . If we assume that (i) $G$ is bounded, (ii) ${\lim_{\ell\to\infty}L[\psi_{\alpha}^{\ell}(x)]\to L[x^{\alpha}]}$ for every $\alpha\in\mathbb{N}_{0}^{d}$ , and (iii) a summability condition analogous to (14) holds, then a generalisation for damped power series kernels of Theorem 2.4 is readily obtained. To generalise Theorem 3.4 we also need to assume that $\{\psi_{n}\}_{n=0}^{2N-1}$ constitutes an extended Chebyshev system.

4.2 Taylor space kernels

Let $d=1$ . Taylor space kernels [Dick, 2006, Zwicknagl and Schaback, 2013] are obtained by selecting $G\equiv 1$ in (19). As $\ell\to\infty$ , the corresponding kernel quadrature rules then converge to polynomial rules. Perhaps the two most interesting special cases are the exponential kernel

[TABLE]

and the Szegő kernel

[TABLE]

The Szegő kernel induces a Hardy space on a disk of radius $\ell$ . Interestingly, it has been pointed out already in the 1970s that approximation with the Szegő kernel yields polynomial methods as $\ell\to\infty$ [Larkin, 1970, Section 3]. See also [Minka, 2000, Section 4]. An extensive numerical investigation has been recently published by Oettershagen [2017, Section 6.2].

4.3 General information functionals

It would also be easy to replace the cubature rule (6) with a generalised version

[TABLE]

where $L_{n}$ are any bounded linear functionals. If $L_{n}$ are such that the matrices

[TABLE]

which are generalisations of (11) and (13), are non-singular, then Theorem 2.4 and Corollary 2.5 can be generalised.

4.4 Non-unisolvent point sets

If the kernel is Gaussian but point set $X\subset\Omega$ is not unisolvent, Schaback [2005] has proved that the kernel interpolant (2) converges the de Boor and Ron polynomial interpolant [de Boor, 1994, de Boor and Ron, 1992], which is the unique interpolant to $f$ at $X$ in a point-dependent polynomial space $\Pi_{X}$ having in a certain sense minimal degree. We expect that extensions for non-unisolvent points of the results in Section 2 are possible. The kernel cubature weights would presumably convergence to the weights $w_{\text{\tiny{pol}}}^{\prime}$ such that $Q_{X}(w_{\text{\tiny{pol}}}^{\prime})[p]=L[p]$ for every $p\in\Pi_{X}$ .

Acknowledgements.

This work was supported by the Aalto ELEC Doctoral School and the Academy of Finland. We thank the reviewers for numerous comments that helped in improving the presentation.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Barrar et al. [1974] R. B. Barrar, H. L. Loeb, and H. Werner. On the existence of optimal integration formulas for analytic functions. Numerische Mathematik , 23(2):105–117, 1974.
2Barrow [1978] D. L. Barrow. On multiple node Gaussian quadrature formulae. Mathematics of Computation , 32(142):431–439, 1978.
3Bojanov [1979] B. D. Bojanov. On the existence of optimal quadrature formulae for smooth functions. Calcolo , 16(1):61–70, 1979.
4Braess and Dyn [1982] D. Braess and N. Dyn. On the uniqueness of monosplines and perfect splines of least L 1 subscript 𝐿 1 L_{1} - and L 2 subscript 𝐿 2 L_{2} -norm. Journal d’Analyse Mathématique , 41(1):217–233, 1982.
5Cavoretto et al. [2015] R. Cavoretto, G. E. Fasshauer, and M. Mc Court. An introduction to the Hilbert-Schmidt SVD using iterated Brownian bridge kernels. Numerical Algorithms , 68(2):393–422, 2015.
6de Boor [1994] C. de Boor. Polynomial interpolation in several variables. In J. Rice and R. A. De Millo, editors, Studies in Computer Science , pages 87–109. 1994.
7de Boor and Ron [1992] C. de Boor and A. Ron. The least solution for the polynomial interpolation problem. Mathematische Zeitschrift , 210(1):347–378, 1992.
8De Marchi and Schaback [2009] S. De Marchi and R. Schaback. Nonstandard kernels and their applications. Dolomites Research Notes on Approximation , 2(1):16–43, 2009.