D-optimal design for multivariate polynomial regression via the   Christoffel function and semidefinite relaxations

Yohann De Castro (LM-Orsay); F Gamboa (IMT); D Henrion (LAAS-MAC,; CTU); R Hess (LAAS-MAC); J.-B Lasserre (LAAS-MAC; IMT)

arXiv:1703.01777·math.ST·March 7, 2017

D-optimal design for multivariate polynomial regression via the Christoffel function and semidefinite relaxations

Yohann De Castro (LM-Orsay), F Gamboa (IMT), D Henrion (LAAS-MAC,, CTU), R Hess (LAAS-MAC), J.-B Lasserre (LAAS-MAC, IMT)

PDF

Open Access

TL;DR

This paper introduces a novel method for designing D-optimal experiments in multivariate polynomial regression using semidefinite programming and Christoffel functions, enabling efficient numerical solutions and geometric interpretation.

Contribution

It develops a new approach combining moment-sum-of-squares hierarchy and Christoffel polynomial for optimal experimental design in polynomial regression.

Findings

01

Effective numerical approximation of D-optimal designs.

02

Utilization of semidefinite programming duality for geometric insights.

03

Applicable to compact semi-algebraic design spaces.

Abstract

We present a new approach to the design of D-optimal experiments with multivariate polynomial regressions on compact semi-algebraic design spaces. We apply the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically and approximately the optimal design problem. The geometry of the design is recovered with semidefinite programming duality theory and the Christoffel polynomial.

Figures10

Click any figure to enlarge with its caption.

Equations120

z_{i} = j = 1 \sum p θ_{j} φ_{j} (ξ_{i}) + ε_{i}, i = 1, \dots; N

z_{i} = j = 1 \sum p θ_{j} φ_{j} (ξ_{i}) + ε_{i}, i = 1, \dots; N

\zeta:=\left(\begin{array}[]{ccc}x_{1}&\cdots&x_{\ell}\\ \frac{n_{1}}{N}&\cdots&\frac{n_{\ell}}{N}\end{array}\right)

\zeta:=\left(\begin{array}[]{ccc}x_{1}&\cdots&x_{\ell}\\ \frac{n_{1}}{N}&\cdots&\frac{n_{\ell}}{N}\end{array}\right)

M (ζ) := i = 1 \sum ℓ w_{i} Φ (x_{i}) Φ^{⊤} (x_{i})

M (ζ) := i = 1 \sum ℓ w_{i} Φ (x_{i}) Φ^{⊤} (x_{i})

\phi_{q}\,:=\,\left\{\begin{array}[]{cll}\mathbb{S}^{+}_{p}&\to&\mathbb{R}\\ M&\mapsto&\phi_{q}(M)\end{array}\right.

\phi_{q}\,:=\,\left\{\begin{array}[]{cll}\mathbb{S}^{+}_{p}&\to&\mathbb{R}\\ M&\mapsto&\phi_{q}(M)\end{array}\right.

\phi_{q}(M):=\left\{\begin{array}[]{ll}(\frac{1}{p}\mathrm{trace}(M^{q}))^{1/q}&\mathrm{if}\ q\neq-\infty,0\\ \det(M)^{1/p}&\mathrm{if}\ q=0\\ \lambda_{\min}(M)&\mathrm{if}\ q=-\infty\end{array}\right.

\phi_{q}(M):=\left\{\begin{array}[]{ll}(\frac{1}{p}\mathrm{trace}(M^{q}))^{1/q}&\mathrm{if}\ q\neq-\infty,0\\ \det(M)^{1/p}&\mathrm{if}\ q=0\\ \lambda_{\min}(M)&\mathrm{if}\ q=-\infty\end{array}\right.

\phi_{q}(M):=\left\{\begin{array}[]{ll}(\frac{1}{p}\mathrm{trace}(M^{q}))^{1/q}&\mathrm{if}\ q\in(0,1]\\ 0&\mathrm{if}\ q\in[-\infty,0].\end{array}\right.

\phi_{q}(M):=\left\{\begin{array}[]{ll}(\frac{1}{p}\mathrm{trace}(M^{q}))^{1/q}&\mathrm{if}\ q\in(0,1]\\ 0&\mathrm{if}\ q\in[-\infty,0].\end{array}\right.

max lo g det M (ζ)

max lo g det M (ζ)

X := {x \in R^{m} : g_{j} (x) \geq 0, j = 1, \dots, m}

X := {x \in R^{m} : g_{j} (x) \geq 0, j = 1, \dots, m}

v_{d} (x) := (degree 0 1, degree 1 x_{1}, \dots, x_{n}, degree 2 x_{1}^{2}, x_{1} x_{2}, \dots, x_{1} x_{n}, x_{2}^{2}, \dots, x_{n}^{2}, \dots, degree d x_{1}^{d}, \dots, x_{n}^{d})^{T}

v_{d} (x) := (degree 0 1, degree 1 x_{1}, \dots, x_{n}, degree 2 x_{1}^{2}, x_{1} x_{2}, \dots, x_{1} x_{n}, x_{2}^{2}, \dots, x_{n}^{2}, \dots, degree d x_{1}^{d}, \dots, x_{n}^{d})^{T}

y_{α} = \int_{X} x^{α} d μ

y_{α} = \int_{X} x^{α} d μ

M_{d} (X) := {y \in R^{(n n + d)} : \exists μ \in M_{+} (X) \mbox s . t . y_{α} = \int_{X} x^{α} d μ, \forall α \in N^{n}, ∣ α ∣ \leq d} .

M_{d} (X) := {y \in R^{(n n + d)} : \exists μ \in M_{+} (X) \mbox s . t . y_{α} = \int_{X} x^{α} d μ, \forall α \in N^{n}, ∣ α ∣ \leq d} .

L_{y} (f) = α \in N^{n} \sum f_{α} y_{α} .

L_{y} (f) = α \in N^{n} \sum f_{α} y_{α} .

M_{d} (y) (α, β) = L_{y} (x^{α} x^{β}) = y_{α + β} .

M_{d} (y) (α, β) = L_{y} (x^{α} x^{β}) = y_{α + β} .

M_{d} (f y) (α, β) = L_{y} (f (x) x^{α} x^{β}) = γ \in N^{n} \sum f_{γ} y_{γ + α + β} .

M_{d} (f y) (α, β) = L_{y} (f (x) x^{α} x^{β}) = γ \in N^{n} \sum f_{γ} y_{γ + α + β} .

M_{2 (d + δ)}^{SDP} (X) := {y \in R^{(n n + 2 d)} : \exists y_{δ} \in R^{(n n + 2 ( d + δ ))} \mbox s u c h t ha t (y_{δ, α})_{∣ α ∣ \leq 2 d} = y \mbox an d M_{d + δ} (y_{δ}) ≽ 0, M_{d + δ - v_{j}} (g_{j} y_{δ}) ≽ 0, j = 1, \dots, m} .

M_{2 (d + δ)}^{SDP} (X) := {y \in R^{(n n + 2 d)} : \exists y_{δ} \in R^{(n n + 2 ( d + δ ))} \mbox s u c h t ha t (y_{δ, α})_{∣ α ∣ \leq 2 d} = y \mbox an d M_{d + δ} (y_{δ}) ≽ 0, M_{d + δ - v_{j}} (g_{j} y_{δ}) ≽ 0, j = 1, \dots, m} .

M_{2 d} (X) \subseteq \dots \subseteq M_{2 d + 2}^{SDP} (X) \subseteq M_{2 d + 1}^{SDP} (X) \subseteq M_{2 d}^{SDP} (X) .

M_{2 d} (X) \subseteq \dots \subseteq M_{2 d + 2}^{SDP} (X) \subseteq M_{2 d + 1}^{SDP} (X) \subseteq M_{2 d}^{SDP} (X) .

M(\mu)=\Big{(}\int_{{\mathcal{X}}}\varphi_{i}\varphi_{j}\mathrm{d}\mu\Big{)}_{1\leq i,j\leq p}=\Big{(}\sum_{|\alpha|,|\beta|\leq d}a_{i,\alpha}a_{j,\beta}y_{\alpha+\beta}\Big{)}_{1\leq i,j\leq p}=\sum_{|\gamma|\leq 2d}A_{\gamma}y_{\gamma}

M(\mu)=\Big{(}\int_{{\mathcal{X}}}\varphi_{i}\varphi_{j}\mathrm{d}\mu\Big{)}_{1\leq i,j\leq p}=\Big{(}\sum_{|\alpha|,|\beta|\leq d}a_{i,\alpha}a_{j,\beta}y_{\alpha+\beta}\Big{)}_{1\leq i,j\leq p}=\sum_{|\gamma|\leq 2d}A_{\gamma}y_{\gamma}

max lo g det M

max lo g det M

s.t. M = ∣ γ ∣ \leq 2 d \sum A_{γ} y_{γ} ≽ 0, y_{γ} = i = 1 \sum ℓ \frac{n _{i}}{N} x_{i}^{γ}, i = 1 \sum ℓ n_{i} = N,

x_{i} \in X, n_{i} \in N, i = 1, \dots, ℓ

\zeta:=\left(\begin{array}[]{ccc}x_{1}&\cdots&x_{\ell}\\ w_{1}&\cdots&w_{\ell}\end{array}\right)\,,

\zeta:=\left(\begin{array}[]{ccc}x_{1}&\cdots&x_{\ell}\\ w_{1}&\cdots&w_{\ell}\end{array}\right)\,,

max lo g det M

max lo g det M

s.t. M = ∣ γ ∣ \leq 2 d \sum A_{γ} y_{γ} ≽ 0, y_{γ} = i = 1 \sum ℓ w_{i} x_{i}^{γ},

x_{i} \in X, w \in W

M_{2 d} (X) = {y \in R^{(n n + 2 d)} : y_{α} = \int_{X} x^{α} d μ, μ = i = 1 \sum ℓ w_{i} δ_{x_{i}}, x_{i} \in X, w \in W}

M_{2 d} (X) = {y \in R^{(n n + 2 d)} : y_{α} = \int_{X} x^{α} d μ, μ = i = 1 \sum ℓ w_{i} δ_{x_{i}}, x_{i} \in X, w \in W}

max lo g det M

max lo g det M

s.t. M = ∣ γ ∣ \leq 2 d \sum A_{γ} y_{γ} ≽ 0,

y \in M_{2 d} (X), y_{0} = 1

L_{y} (P_{α} P_{β}) = δ_{α = β} \mbox an d L_{y} (x^{α} P_{β}) = 0 \forall α ≺ β,

L_{y} (P_{α} P_{β}) = δ_{α = β} \mbox an d L_{y} (x^{α} P_{β}) = 0 \forall α ≺ β,

p_{d} : x \mapsto p_{d} (x) := ∣ α ∣ \leq d \sum P_{α} (x)^{2}, x \in R^{n},

p_{d} : x \mapsto p_{d} (x) := ∣ α ∣ \leq d \sum P_{α} (x)^{2}, x \in R^{n},

p_{d} (x) = v_{d} (x)^{T} M_{d} (y)^{- 1} v_{d} (x), \forall x \in R^{n},

p_{d} (x) = v_{d} (x)^{T} M_{d} (y)^{- 1} v_{d} (x), \forall x \in R^{n},

\frac{1}{p_{d}(\xi)}\,=\,\min_{P\in\mathbb{R}[x]_{d}}\Big{\{}\int P(x)^{2}\,d\mu(x)\>:\>P(\xi)=1\,\Big{\}},\qquad\forall\xi\in\mathbb{R}^{n},

\frac{1}{p_{d}(\xi)}\,=\,\min_{P\in\mathbb{R}[x]_{d}}\Big{\{}\int P(x)^{2}\,d\mu(x)\>:\>P(\xi)=1\,\Big{\}},\qquad\forall\xi\in\mathbb{R}^{n},

\begin{array}[]{rl}\rho=\ \displaystyle\max_{\mathbf{y}}&\log\det\mathbf{M}_{d}(\mathbf{y})\\ \text{s.t.}&\mathbf{y}\in\mathcal{M}_{2d}(\mathcal{X}),\ y_{0}=1.\end{array}

\begin{array}[]{rl}\rho=\ \displaystyle\max_{\mathbf{y}}&\log\det\mathbf{M}_{d}(\mathbf{y})\\ \text{s.t.}&\mathbf{y}\in\mathcal{M}_{2d}(\mathcal{X}),\ y_{0}=1.\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimal Experimental Design Methods · Probabilistic and Robust Engineering Design · Advanced Multi-Objective Optimization Algorithms

Full text

D-optimal design for multivariate polynomial regression via the Christoffel function and semidefinite relaxations

Y. De Castro and F. Gamboa and D. Henrion and R. Hess and J.-B. Lasserre

YDC is with the Laboratoire de Mathématiques d’Orsay, Univ. Paris-Sud, CNRS, Université Paris-Saclay, 91405 Orsay, France.

[email protected] www.math.u-psud.fr/$\sim$decastro FG is with the Institut de Mathématiques de Toulouse (CNRS UMR 5219), Université Paul Sabatier, 118 route de Narbonne, 31062 Toulouse, France.

[email protected] www.math.univ-toulouse.fr/$\sim$gamboa DH is with LAAS-CNRS, Université de Toulouse, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France and with the Faculty of Electrical Engineering, Czech Technical University in Prague, Technická 2, CZ-16626 Prague, Czech Republic

[email protected] homepages.laas.fr/henrion RH is with LAAS-CNRS, Université de Toulouse, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France.

[email protected]

JBL is with LAAS-CNRS, Université de Toulouse, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France.

[email protected] homepages.laas.fr/lasserre

(Date: Draft of March 2, 2024)

Abstract.

We present a new approach to the design of D-optimal experiments with multivariate polynomial regressions on compact semi-algebraic design spaces. We apply the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically and approximately the optimal design problem. The geometry of the design is recovered with semidefinite programming duality theory and the Christoffel polynomial.

Key words and phrases:

Experimental design; semidefinite programming

1. Introduction

1.1. Convex design theory

The optimum experimental designs are computational and theoretical objects that minimize the variance of the best linear unbiased estimators in regression problems. In this frame, the experimenter models the responses $z_{1},\ldots,z_{N}$ of a random experiment whose inputs are represented by a vector $\xi_{i}\in\mathbb{R}^{n}$ with respect to known regression functions $\varphi_{1},\ldots,\varphi_{p}$ , namely

[TABLE]

where $\theta_{1},\ldots,\theta_{p}$ are unknown parameters that the experimenter wants to estimate, $\varepsilon_{i}$ is some noise and the inputs $\xi_{i}$ are chosen by the experimenter in a design space $\mathcal{X}\subseteq\mathbb{R}^{n}$ . Assume that the inputs $\xi_{i}$ , for $i=1,\ldots,N$ , are chosen within a set of distinct points $x_{1},\ldots,x_{\ell}$ with $\ell\leq N$ , and let $n_{k}$ denote the number of times the particular point $x_{k}$ occurs among $\xi_{1},\ldots,\xi_{N}$ . This would be summarized by

[TABLE]

whose first row gives the points in the design space $\mathcal{X}$ where the inputs parameters have to be taken and the second row indicates the experimenter which proportion of experiments (frequencies) have to be done at these points. The goal of the design of experiment theory is then to assess which input parameters and frequencies the experimenter has to consider. For a given $\zeta$ the standard analysis of the Gaussian linear model shows that the minimal covariance matrix (with respect to Loewner ordering) of unbiased estimators can be expressed in terms of the Moore-Penrose pseudoinverse of the information matrix which is defined by

[TABLE]

where $\Phi:=(\phi_{1},\ldots,\phi_{p})$ is the column vector of regression functions. One major aspect of design of experiment theory seeks to maximize the information matrix over the set of all possible $\zeta$ . Notice the Loewner ordering is partial and, in general, there is no greatest element among all possible information matrices $M(\zeta)$ . The standard approach is then to consider some statistical criteria, namely Kiefer’s $\phi_{q}$ -criteria [K74], in order to describe and construct the “optimal designs” with respect to those criteria. Observe that the information matrix belongs to $\mathbb{S}^{+}_{p}$ , the space of symmetric nonnegative definite matrices of size $p$ , and for all $q\in[-\infty,1]$ define the function

[TABLE]

where for positive definite matrices $M$ it holds

[TABLE]

and for nonnegative definite matrices $M$ it holds

[TABLE]

Those criteria are meant to be real valued, positively homogeneous, non constant, upper semi-continuous, isotonic (with respect to the Loewner ordering) and concave functions. Throughout this paper, we restrict ourselves to the $D$ -optimality criteria which corresponds to the choice $q=0$ . Other criteria will be studied elsewhere.

In particular, in this paper we search for solutions $\zeta^{\star}$ to the following optimization problem

[TABLE]

where the maximum is taken over all $\zeta$ of the form (1.1). Note that the logarithm of the determinant is used instead of $\phi_{0}$ because of its standard use in semidefinite programming (SDP) as a barrier function for the cone of positive definite matrices.

1.2. State of the art

Optimal design is at the heart of statistical planning for inference in the linear model, see for example [BHH78]. While the case of discrete input factors is generally tackled by algebraic and combinatoric arguments (e.g., [B08]), the one of continuous input factors often leads to an optimization problem. In general, the continuous factors are generated by a vector $\Phi$ of linearly independent regular functions on the design space $\mathcal{X}$ . One way to handle the problem is to focus only on $\mathcal{X}$ ignoring the function $\Phi$ and to try to draw the design points filling the best the set $\mathcal{X}$ . This is generally done by optimizing a cost function on $\mathcal{X}^{N}$ that traduces the way the design points are positioned between each other and/or how they fill the space. Generic examples are the so-called maxmin or minmax criteria (see for example [PW88] or [WPN97]) and the minimum discrepancy designs (see for example [LQX05]). Another point of view—which is the one developed here—relies on the maximization of the information matrix. Of course, as explained before, the Loewner order is partial and so the optimization can not stand on this matrix but on one of its feature. A pioneer paper adopting this point of view is the one of Elfving [E52]. In the early 60’s, in a series of papers, Kiefer and Wolwofitz shade new lights on this kind of methods for experimental design by introducing the equivalence principle and proposing in some cases algorithms to solve the optimization problem, see [K74] and references therein. Following the early works of Karlin and Studden ([KS66b], [KS66a]), the case of polynomial regression on a compact interval on $\mathbb{R}$ has been widely studied. In this frame, the theory is almost complete an many thing can be said about the optimal solutions for the design problem (see [DS93]). Roughly speaking, the optimal design points are related to the zeros of orthogonal polynomials built on an equilibrum measure. We refer to the excelent book of Dette and Studden [DS97] and reference therein for a complete overview on the subject. In the one dimensional frame, other systems of functions $\Phi$ (trigonometric functions or some $T$ -system, see [KN77] for a definition) are studied in a same way in [DS97], [LS85] and [IS01]. In the multidimensional case, even for polynomial systems, very few case of explicit solutions are known. Using tensoring arguments the case of a rectangle is treated in [DS97]. Particular models of degree two are studied in [DG14] and [PW94]. Away from these particular cases, the construction of the optimal design relies on numerical optimization procedures. The case of the determinant ( $D$ -optimality) is studied for example in [W70] and [VBW98]. An other criterion based on matrix conditioning is developed in [MYZ14]. General optimization algorithm are discussed in [F10] and [ADT07]. In the frame of fixed given support points efficient SDP based algorithms are proposed and studied in [S11] and [SH15]. Let us mention, the paper [VBW98] which is one of the original motivation to develop SDP solvers, especially for Max Det Problems (corresponding to $D$ -optimal design) and the so-called problem of analytical centering.

1.3. Contribution

For the first time, this paper introduces a general method to compute the approximate $D$ -optimal designs on a large variety of design spaces that we referred to as semi-algebraic sets, see [L10] for a definition. This family can be understood as any sets given by intersections and complements of the level sets of multivariate polynomials. The theoretical guarantees are given by Theorems 4.3 and 4.4. We apply the moment-sum-of-squares hierarchy (a.k.a. the Lasserre hierarchy) of SDP problems to solve numerically and approximately the optimal design problem They show the convergence of our method towards the optimal information matrix as the order of the hierarchy increases. Furthermore, we show that our method recovers the optimal design when finite convergence of this hierarchy occurs. To recover the geometry of the design we use SDP duality theory and the Christoffel polynomial. We have run several numerical experiments for which finite convergence holds leading to a surprisingly fast and reliable method to compute optimal designs. As illustrated by our examples, using Christoffel polynomials of degrees higher than two allows to reconstruct designs with points in the interior of the domain, contrasting with the classical use of ellipsoids for linear regressions.

1.4. Outline of the paper

In Section 2, after introducing necessary notation, we shortly explain some basics on moments and moment matrices, and present the approximation of the moment cone via the Lasserre hierarchy. Section 3 is dedicated to further describing optimum designs and their approximations. At the end of the section we propose a two step procedure to solve the approximate design problem. Solving the first step is subject to Section 4. There, we find a sequence of moments associated with the optimal design measure. Recovering this measure (step two of the procedure) is discussed in Section 5. We finish the paper with some illustrating examples and a conclusion.

2. Polynomial optimal design and moments

This section collects preliminary material on semialgebraic sets, moments and moment matrices, using the notations of [L10]. This material will be used to restrict our attention to polynomial optimal design problems with polynomial regression functions and semi-algebraic design spaces.

2.1. Polynomial optimal design

Denote by $\mathbb{R}[x]$ the vector space of real polynomials in the variables $x=(x_{1},\dotsc,x_{n})$ , and for $d\in\mathbb{N}$ , define $\mathbb{R}[x]_{d}:=\{p\in\mathbb{R}[x]:\deg{p}\leq d\}$ where $\deg p$ denotes the total degree of $p$ .

In this paper we assume that the regression functions are multivariate polynomials, i.e. $\Phi=(\phi_{1},\ldots,\phi_{p})\in(\mathbb{R}[x]_{d})^{p}$ . Moreover, we consider that the design space $\mathcal{X}\subset\mathbb{R}^{n}$ is a given closed basic semi-algebraic set

[TABLE]

for given polynomials $g_{j}\in{\mathbb{R}}[x]$ , $j=1,\ldots,m$ , whose degrees are denoted by $d_{j}$ , $j=1,\ldots,m$ . Assume that ${\mathcal{X}}$ is compact with an algebraic certificate of compactness. For example, one of the polynomial inequalities $g_{j}(x)\geq 0$ should be of the form $R^{2}-\sum_{i=1}^{n}x_{i}^{2}\geq 0$ for a sufficiently large constant $R$ .

Notice that those assumptions cover a large class of problems in optimal design theory, see for instance [DS97, Chapter 5]. In particular, observe that the design space $\mathcal{X}$ defined by (2.1) is not necessarily convex and note that the polynomial regressors $\Phi$ can handle incomplete $m$ -way $d$ th degree polynomial regression.

The monomials $x_{1}^{\alpha_{1}}\cdots x_{n}^{\alpha_{n}}$ , with $\alpha=(\alpha_{1},\dotsc,\alpha_{n})\in\mathbb{N}^{n}$ , form a basis of the vector space $\mathbb{R}[x]$ . We use the multi-index notation $x^{\alpha}:=x_{1}^{\alpha_{1}}\cdots x_{n}^{\alpha_{n}}$ to denote these monomials. In the same way, for a given $d\in\mathbb{N}$ the vector space $\mathbb{R}[x]_{d}$ has dimension $\binom{n+d}{n}$ with basis $(x^{\alpha})_{|\alpha|\leq d}$ , where $|\alpha|:=\alpha_{1}+\cdots+\alpha_{n}$ . We write

[TABLE]

for the column vector of the monomials ordered according to their degree, and where monomials of the same degree are ordered with respect to the lexicographic ordering.

The cone $\mathscr{M}_{+}({\mathcal{X}})$ of nonnegative Borel measures supported on $\mathcal{X}$ is understood as the dual to the cone of nonnegative elements of the space $\mathscr{C}({\mathcal{X}})$ of continuous functions on $\mathcal{X}$ .

2.2. Moments, the moment cone and the moment matrix

Given $\mu\in\mathscr{M}_{+}(\mathcal{X})$ and $\alpha\in\mathbb{N}^{n}$ , we call

[TABLE]

the moment of order $\alpha$ of $\mu$ . Accordingly, we call the sequence $\mathbf{y}=(y_{\alpha})_{\alpha\in\mathbb{N}^{n}}$ the moment sequence of $\mu$ . Conversely, we say that a sequence $\mathbf{y}=(y_{\alpha})_{\alpha\in\mathbb{N}^{n}}\subseteq\mathbb{R}$ has a representing measure, if there exists a measure $\mu$ such that $\mathbf{y}$ is its moment sequence.

We denote by $\mathcal{M}_{d}(\mathcal{X})$ the convex cone of all truncated sequences $\mathbf{y}=(y_{\alpha})_{|\alpha|\leq d}$ which have a representing measure supported on $\mathcal{X}$ . We call it the moment cone of $\mathcal{X}$ . It can be expressed as

[TABLE]

We also denote by $\mathcal{P}_{d}(\mathcal{X})$ the convex cone of polynomials of degree at most $d$ that are nonnegative on $\mathcal{X}$ . When $\mathcal{X}$ is compact then $\mathcal{M}_{d}(\mathcal{X})=\mathcal{P}_{d}(\mathcal{X})^{\star}$ and $\mathcal{P}_{d}(\mathcal{X})=\mathcal{M}_{d}(\mathcal{X})^{\star}$ (see e.g. [L15][Lemma 2.5]).

When the design space is given by the univariate interval $\mathcal{X}=[a,b]$ , i.e., $n=1$ , then this cone is representable using positive semidefinite Hankel matrices, which implies that convex optimization on this cone can be carried out with efficient interior point algorithms for semidefinite programming, see e.g. [VBW98]. Unfortunately, in the general case, there is no efficient representation of this cone. It has actually been shown in [S16] that the moment cone is not semidefinite representable, i.e. it cannot be expressed as the projection of a linear section of the cone of positive semidefinite matrices. However, we can use semidefinite approximations of this cone as discussed in Section 2.3.

Given a sequence $\mathbf{y}=(y_{\alpha})_{\alpha\in\mathbb{N}^{n}}\subseteq\mathbb{R}$ we define the linear functional $L_{\mathbf{y}}:\mathbb{R}[x]\to\mathbb{R}$ which maps a polynomial $f=\sum_{\alpha\in\mathbb{N}^{n}}f_{\alpha}x^{\alpha}$ to

[TABLE]

A sequence $\mathbf{y}=(y_{\alpha})_{\alpha\in\mathbb{N}^{n}}$ has a representing measure $\mu$ supported on $\mathcal{X}$ if and only if $L_{\mathbf{y}}(f)\geq 0$ for all polynomials $f\in\mathbb{R}[x]$ nonnegative on $\mathcal{X}$ [L10, Theorem 3.1].

The moment matrix of a truncated sequence $\mathbf{y}=(y_{\alpha})_{|\alpha|\leq 2d}$ is the $\binom{n+d}{n}\times\binom{n+d}{n}$ -matrix $M_{d}(\mathbf{y})$ with rows and columns respectively indexed by $\alpha\in\mathbb{N}^{n},|\alpha|,|\beta|\leq d$ and whose entries are given by

[TABLE]

It is symmetric and linear in $\mathbf{y}$ , and if $\mathbf{y}$ has a representing measure, then $M_{d}(\mathbf{y})$ is positive semidefinite, denoted by $M_{d}(\mathbf{y})\succcurlyeq 0$ .

Similarly, we define the localizing matrix of a polynomial $f=\sum_{|\alpha|\leq r}f_{\alpha}x^{\alpha}\in\mathbb{R}[x]_{r}$ of degree $r$ and a sequence $\mathbf{y}=(y_{\alpha})_{|\alpha|\leq 2d+r}$ as the $\binom{n+d}{n}\times\binom{n+d}{n}$ -matrix $M_{d}(f\mathbf{y})$ with rows and columns respectively indexed by $\alpha\in\mathbb{N}^{n},|\alpha|,|\beta|\leq d$ and whose entries are given by

[TABLE]

If $\mathbf{y}$ has a representing measure $\mu$ , then $M_{d}(f\mathbf{y})\succcurlyeq 0$ for $f\in\mathbb{R}[x]_{d}$ whenever the support of $\mu$ is contained in the set $\{x\in\mathbb{R}^{n}:f(x)\geq 0\}$ .

Since $\mathcal{X}$ is basic semialgebraic with a certificate of compactness, by Putinar’s theorem [L10, Theorem 3.8], we also know the converse statement in the infinite case, namely $\mathbf{y}=(y_{\alpha})_{\alpha\in\mathbb{N}^{n}}$ has a representing measure $\mu\in\mathscr{M}_{+}(\mathcal{X})$ if and only if for all $d\in\mathbb{N}$ the matrices $M_{d}(\mathbf{y})$ and $M_{d}(g_{j}\mathbf{y}),\ j=1,\dots,m$ , are positive semidefinite.

2.3. Approximations of the moment cone

Letting $v_{j}:=\lceil d_{j}/2\rceil$ , $j=1,\ldots,m$ , for half the degree of the $g_{j}$ , by Putinar’s theorem, we can approximate the moment cone $\mathcal{M}_{2d}(\mathcal{X})$ by the following semidefinite representable cones for $\delta\in\mathbb{N}$

[TABLE]

By semidefinite representable we mean that the cones are projections of linear sections of semidefinite cones. Since $\mathcal{M}_{2d}(\mathcal{X})$ is contained in every $\mathcal{M}_{2(d+\delta)}^{\mathsf{SDP}}(\mathcal{X})$ , they are outer approximations of the moment cone. Moreover, they form a nested sequence, so we can build the hierarchy

[TABLE]

This hierarchy actually converges, meaning $\mathcal{M}_{2d}(\mathcal{X})=\overline{\bigcap_{\delta=0}^{\infty}\mathcal{M}_{2d+\delta}^{\mathsf{SDP}}(\mathcal{X})}$ , where $\overline{A}$ denotes the topological closure of the set $A$ .

3. Approximate Optimal Design

3.1. Problem reformulation in the multivariate polynomial case

For all $i=1,\ldots,p$ and $x\in{\mathcal{X}}$ , let $\varphi_{i}(x):=\sum_{|\alpha|\leq d}a_{i,\alpha}x^{\alpha}$ with appropriate $a_{i,\alpha}\in\mathbb{R}$ . Define for $\mu\in\mathscr{M}_{+}(\mathcal{X})$ with moment sequence $\mathbf{y}$ the information matrix

[TABLE]

where we have set $A_{\gamma}:=\Big{(}\sum_{\alpha+\beta=\gamma}a_{i,\alpha}a_{j,\beta}\Big{)}_{1\leq i,j\leq p}$ for $|\gamma|\leq 2d$ .

Further, let $\mu=\sum_{i=1}^{\ell}w_{i}\delta_{x_{i}}$ where $\delta_{x}$ denotes the Dirac measure at the point $x\in\mathcal{X}$ and observe that $M(\mu)=\sum_{i=1}^{\ell}w_{i}\Phi(x_{i})\Phi^{\top}(x_{i})$ as in (1.2).

The optimization problem

[TABLE]

where the maximization is with respect to $x_{i}$ and $n_{i}$ , $i=1,\ldots,\ell$ , subject to the constraint that the information matrix $M$ is positive semidefinite, is by construction equivalent to the original design problem (1.3). In this form, problem (3.1) is difficult because of the integrality constraints on the $n_{i}$ and the nonlinear relation between $\mathbf{y}$ , $x_{i}$ and $n_{i}$ . We will address these difficulties in the sequel by first relaxing the integrality constraints.

3.2. Relaxing the integrality constraints

In problem 3.1, the set of admissible frequencies $w_{i}=n_{i}/N$ is discrete, which makes it a potentially difficult combinatorial optimization problem. A popular solution is then to consider “approximate” designs defined by

[TABLE]

where the frequencies $w_{i}$ belong to the unit simplex ${\mathcal{W}}:=\{w\in{\mathbb{R}}^{l}:0\leq w_{i}\leq 1,\>\sum_{i=1}^{\ell}w_{i}=1\}$ . Accordingly, any solution to (1.3) where the maximum is taken over all matrices of type (3.2) is called “approximate optimal design”, yielding the following relaxation of problem 3.1

[TABLE]

where the maximization is with respect to $x_{i}$ and $w_{i}$ , $i=1,\ldots,\ell$ , subject to the constraint that the information matrix $M$ is positive semidefinite. In this problem, the nonlinear relation between $\mathbf{y}$ , $x_{i}$ and $w_{i}$ is still an issue.

3.3. Moment formulation

Let us introduce a two-step-procedure to solve the approximate optimal design problem (3.3). For this we will first again reformulate our problem.

Observe that by Carathéodory’s theorem, the truncated moment cone $\mathcal{M}_{2d}(\mathcal{X})$ defined in (2.2) is exactly

[TABLE]

so that problem (3.2) is equivalent to

[TABLE]

where the maximization is now with respect to the sequence $\mathbf{y}$ . Moment problem (3.4) is finite-dimensional and convex, yet the constraint $\mathbf{y}\in\mathcal{M}_{2d}(\mathcal{X})$ is difficult to handle. We will show that by approximating the truncated moment cone $\mathcal{M}_{2d}(\mathcal{X})$ by a nested sequence of semidefinite representable cones as indicated in (2.3), we obtain a hierarchy of finite dimensional semidefinite programming problems converging to the optimal solution of (3.4). Since semidefinite programming problems can be solved efficiently, we can compute a numerical solution to problem (3.2).

This describes step one of our procedure. The result of it is a sequence $\mathbf{y}^{\star}$ of moments. Consequently, in a second step, we need to find a representing atomic measure $\mu^{\star}$ of $\mathbf{y}^{\star}$ in order to identify the approximate optimum design $\zeta^{\star}$ .

4. The ideal problem on moments and its approximation

For notational simplicity, let us use the standard monomial basis of $\mathbb{R}[x]_{d}$ for the regression functions, meaning $\Phi=(\varphi_{1},\dotsc,\varphi_{p}):=(x^{\alpha})_{|\alpha|\leq d}$ with $p=\binom{n+d}{n}$ . Note that this is not a restriction, since one can get the results for other choices of $\Phi$ by simply performing a change of basis. Different polynomial bases can be considered and one may consult the standard framework described by [DS97, Chapter 5.8]. For the sake of conciseness, we do not expose the notion of incomplete $q$ -way $m$ -th degree polynomial regression here but the reader may remark that the strategy developed in this paper can handle such a framework.

4.1. Christoffel polynomials

It turns out that the (unique) optimal solution $\mathbf{y}\in\mathcal{M}_{2d}(\mathcal{X})$ of (3.4) can be characterized in terms of the Christoffel polynomial of degree $2d$ associated with an optimal measure $\mu$ whose moments up to order $2d$ coincide with $\mathbf{y}$ .

Definition 4.1.

Let $\mathbf{y}\in\mathbb{R}^{\binom{n+2d}{n}}$ be such that $\mathbf{M}_{d}(\mathbf{y})\succ 0$ . Then there exists a family of orthonormal polynomials $(P_{\alpha})_{|\alpha|\leq d}\subseteq\mathbb{R}[x]_{d}$ satisfying

[TABLE]

where monomials are ordered with the lexicographical ordering on $\mathbb{N}^{n}$ . We call the polynomial

[TABLE]

the Christoffel polynomial (of degree $d$ ) associated with $\mathbf{y}$ .

The Christoffel polynomial111In fact what is referred to in the literature is its reciprocal $x\mapsto 1/p_{d}(x)$ called the Chistoffel function. can be expressed in different ways. For instance via the inverse of the moment matrix by

[TABLE]

or via its extremal property

[TABLE]

when $\mathbf{y}$ has a representing measure $\mu$ . (When $\mathbf{y}$ does not have a representing measure $\mu$ just replace $\int P(x)^{2}d\mu(x)$ with $L_{\mathbf{y}}(P^{2})\,(=P^{T}\mathbf{M}_{d}(\mathbf{y})\,P$ )). For more details the interested reader is referred to [LP16] and the references therein.

4.2. The ideal problem on moments

The ideal formulation (3.4) of our approximate optimal design problem reads

[TABLE]

For this we have the following result

Theorem 4.2.

Let $\mathcal{X}\subseteq\mathbb{R}^{n}$ be compact with nonempty interior. Problem (4.1) is a convex optimization problem with a unique optimal solution $\mathbf{y}^{\star}\in\mathcal{M}_{2d}(\mathcal{X})$ . It is the vector of moments (up to order $2d$ ) of a measure $\mu^{\star}$ supported on at least $\binom{n+d}{n}$ and at most $\binom{n+2d}{n}$ points in the set $\Omega:=\{x\in\mathcal{X}:\binom{n+d}{n}-p_{d}^{\star}(x)=0\}$ where $\binom{n+d}{n}-p_{d}^{\star}\in\mathcal{P}_{2d}(\mathcal{X})$ and $p^{\star}_{d}$ is the Christoffel polynomial

[TABLE]

Proof.

First, let us prove that (4.1) has an optimal solution. The feasible set is nonempty (take as feasible point the vector $\mathbf{y}\in\mathcal{M}_{2d}(\mathcal{X})$ associated with the Lebesgue measure on $\mathcal{X}$ , scaled to be a probability measure) with finite associated objective value, because $\det\mathbf{M}_{d}(\mathbf{y})>0$ . Hence, $\rho>-\infty$ and in addition Slater’s condition holds because $\mathbf{y}\in{\rm int}\mathcal{M}(\mathcal{X})$ (that is, $\mathbf{y}$ is a strictly feasible solution to (4.1)).

Next, as $\mathcal{X}$ is compact there exists $M>1$ such that $\displaystyle\int_{\mathcal{X}}x_{i}^{2d}\,d\mu<M$ for every probability measure $\mu$ on $\mathcal{X}$ and every $i=1,\ldots,n$ . Hence, $\max\{y_{0},\ \max_{i}\{L_{y}(x_{i}^{2d})\}\}<M$ which by [LN07] implies that $|y_{\alpha}|\leq M$ for every $|\alpha|\leq 2d$ , which in turn implies that the feasible set of (4.1) is compact. As the objective function is continuous and $\rho>-\infty$ , problem (4.1) has an optimal solution $\mathbf{y}^{\star}\in\mathcal{M}_{2d}(\mathcal{X})$ .

Furthermore, an optimal solution $\mathbf{y}^{\star}\in\mathcal{M}_{2d}(\mathcal{X})$ is unique because the objective function is strictly convex and the feasible set is convex. In addition, since $\rho>-\infty$ , $\det\mathbf{M}_{d}(\mathbf{y}^{\star})\neq 0$ and so $\mathbf{M}_{d}(\mathbf{y}^{\star})$ is non singular.

Next, writing $\mathbf{B}_{\alpha}$ , $\alpha\in\mathbb{N}^{n}_{2d}$ , for the real matrices satisfying $\sum_{|\alpha|\leq 2d}\mathbf{B}_{\alpha}x^{\alpha}=\mathbf{v}_{d}(x)\mathbf{v}_{d}(x)^{T}$ and $\langle\mathbf{A},\mathbf{B}\rangle={\rm trace}(\mathbf{A}\mathbf{B})$ for two matrices $\mathbf{A}$ and $\mathbf{B}$ , the necessary Karush-Kuhn-Tucker optimality conditions222For the optimization problem $\min\,\{f(x):Ax=b;x\in C\}$ where $f$ is differentiable, $A\in\mathbb{R}^{m\times n}$ and $C\subset\mathbb{R}^{n}$ is a nonempty closed convex cone, the KKT-optimality conditions at a feasible point $x$ state that there exists $\lambda^{\star}\in\mathbb{R}^{m}$ and $u\in C^{\star}$ such that $\nabla f(x)-A^{T}\lambda^{\star}=u^{\star}$ and $\langle x,u^{\star}\rangle=0$ . Slater’s condition holds if there exists a feasible solution $x\in{\rm int}(C)$ , in which case the KKT-optimality conditions are necessary and sufficient if $f$ is convex. (in short KKT-optimality conditions) state that an optimal solution $\mathbf{y}^{\star}\in\mathcal{M}_{2d}(\mathcal{X})$ should satisfy

[TABLE]

(where $e_{0}=(1,0,\ldots 0)$ and $\lambda^{\star}$ is the dual variable associated with the constraint $y^{\star}=1$ ), with the complementarity condition $\langle\mathbf{y}^{\star},p^{\star}\rangle=0$ . That is:

[TABLE]

Multiplying (4.3) term-wise by $y^{\star}_{\alpha}$ , summing up and invoking the complementarity condition, yields

[TABLE]

Similarly, multiplying (4.3) term-wise by $x^{\alpha}$ and summing up yields

[TABLE]

where the $(P_{\alpha})$ , $\alpha\in\mathbb{N}^{n}_{d}$ , are the orthonormal polynomials (up to degree $d$ ) w.r.t. $\mu^{\star}$ , where $\mu^{\star}$ is a representing measure of $\mathbf{y}^{\star}$ (recall that $\mathbf{y}^{\star}\in\mathcal{M}_{2d}(\mathcal{X})$ ). Therefore $p^{\star}=\binom{n+d}{n}-p^{\star}_{d}\in\mathcal{P}_{2d}(\mathcal{X})$ where $p^{\star}_{d}$ is the Christoffel polynomial of degree $2d$ associated with $\mu^{\star}$ . Next, the complementarity condition $\langle\mathbf{y}^{\star},p^{\star}\rangle=0$ reads

[TABLE]

which implies that the support of $\mu^{\star}$ is included in the algebraic set $\{x\in\mathcal{X}:p^{\star}(x)=0\}$ . Finally, that $\mu^{\star}$ is an atomic measure supported on at most $\binom{n+2d}{n}$ points follows from Tchakaloff’s theorem [L10, Theorem B.12] which states that for every finite Borel probability measure on $\mathcal{X}$ and every $t\in\mathbb{N}$ , there exists an atomic measure $\mu_{t}$ supported on $\ell\leq\binom{n+t}{n}$ points such that all moments of $\mu_{t}$ and $\mu^{\star}$ agree up to order $t$ . So let $t=2d$ . Then $\ell\leq\binom{n+2d}{n}$ . If $\ell<\binom{n+d}{n}$ , then ${\rm rank}\mathbf{M}_{d}(\mathbf{y}^{\star})<\binom{n+d}{n}$ in contradiction to $\mathbf{M}_{d}(\mathbf{y}^{\star})$ being non-singular. Therefore, $\binom{n+d}{n}\leq\ell\leq\binom{n+2d}{n}$ . ∎

So we obtain a nice characterization of the unique optimal solution $\mathbf{y}^{\star}$ of (4.1). It is the vector of moments up to order $2d$ of a measure $\mu^{\star}$ supported on finitely many (at least ${n+d\choose n}$ and at most ${n+2d\choose n}$ ) points of $\mathcal{X}$ . This support of $\mu^{\star}$ consists of zeros of the equation $\binom{n+d}{n}-p^{\star}_{d}(x)=0$ , where $p^{\star}_{d}$ is the Christoffel polynomial associated with $\mu^{\star}$ . Moreover the level set $\{x:p^{\star}_{d}(x)\leq{n+d\choose n}\}$ contains $\mathcal{X}$ and intersects $\mathcal{X}$ precisely at the support of $\mu^{\star}$ .

4.3. The SDP approximation scheme

Let $\mathcal{X}\subseteq\mathbb{R}^{n}$ be as defined in (2.1), assumed to be compact. So with no loss of generality (and possibly after scaling), assume that $x\mapsto g_{1}(x)=1-\|x\|^{2}\geq 0$ is one of the constraints defining $\mathcal{X}$ .

Since the ideal moment problem (4.1) involves the moment cone $\mathcal{M}_{2d}(\mathcal{X})$ which is not SDP representable, we use the hierarchy (2.3) of outer approximations of the moment cone to relax problem (4.1) to an SDP problem. So for a fixed integer $\delta\geq 1$ we consider the problem

[TABLE]

Since (4.5) is a relaxation of the ideal problem (4.1), then necessarily $\rho_{\delta}\geq\rho$ for all $\delta$ . In analogy with Theorem 4.2 we have the following result

Theorem 4.3.

Let $\mathcal{X}\subseteq\mathbb{R}^{n}$ as in (2.1) be compact and with nonempty interior. Then

a)

SDP problem (4.5) has a unique optimal solution $\mathbf{y}^{\star}_{d}\in\mathbb{R}^{\binom{n+2d}{n}}$ . 2. b)

The moment matrix $\mathbf{M}_{d}(\mathbf{y}^{\star}_{d})$ is positive definite. Let $p^{\star}_{d}$ be the Christoffel polynomial associated with $\mathbf{y}^{\star}_{d}$ . Then $\binom{n+d}{n}-p^{\star}_{d}(x)\geq 0$ for all $x\in\mathcal{X}$ and $L_{\mathbf{y}^{\star}_{d}}(\binom{n+d}{n}-p^{\star}_{d})=0$ .

Proof.

a)

Let us prove that (4.5) has an optimal solution. The feasible set is nonempty, since we can take as feasible point the vector $\tilde{\mathbf{y}}$ associated with the Lebesgue measure on $\mathcal{X}$ , scaled to be a probability measure. Because $\det\mathbf{M}_{d}(\tilde{\mathbf{y}})>0$ , the associated objective value is finite. Hence, Slater’s condition holds for (4.5) and $\rho_{\delta}>-\infty$ .

Next, let $\mathbf{y}$ be an arbitrary feasible solution and $\mathbf{y}_{\delta}\in\mathbb{R}^{\binom{2(d+\delta)+n}{n}}$ an arbitrary lifting of $\mathbf{y}$ (recall the definition of $\mathcal{M}_{2(d+\delta)}^{\mathsf{SDP}}(\mathcal{X})$ ). As $g_{1}(x)=1-\|x\|^{2}$ and $\mathbf{M}_{d+\delta-1}(g_{1}\,\mathbf{y}_{\delta})\succcurlyeq 0$ , we have

[TABLE]

and so by [LN07]

[TABLE]

This implies that the set of feasible liftings $\mathbf{y}_{\delta}$ is compact which implies that there is an optimal $\mathbf{y}^{\star}_{\delta}\in\mathbb{R}^{\binom{2(d+\delta)+n}{n}}$ . As a consequence, the subvector $\mathbf{y}^{\star}_{d}=(\mathbf{y}^{\star}_{\delta,\alpha})_{|\alpha|\leq 2d}\in\mathbb{R}^{\binom{2d+n}{n}}$ is an optimal solution to (4.5). It is unique due to strict convexity of the objective function. 2. b)

As $\rho_{\delta}>-\infty$ , we have $\det\mathbf{M}_{d}(\mathbf{y}^{\star}_{d})>0$ . Now, write $\langle\mathbf{A},\mathbf{B}\rangle={\rm trace}(\mathbf{A}\mathbf{B})$ for two matrices $\mathbf{A}$ and $\mathbf{B}$ and let $\mathbf{B}_{\alpha},\tilde{\mathbf{B}}_{\alpha}$ and $\mathbf{C}_{j\alpha}$ be real symmetric matrices such that

[TABLE]

Problem (4.5) is a convex optimization problem which can be rewritten as

[TABLE]

for which Slater’s condition holds. For all its optimal solutions $\mathbf{y}^{\star}_{\delta}=(y^{\star}_{\delta,\alpha})\in\mathbb{R}^{\binom{2(d+\delta)+n}{n}}$ , the restriction $\mathbf{y}^{\star}_{d}=(y^{\star}_{d,\alpha})=(y^{\star}_{\delta,\alpha})$ , $\alpha\in\mathbb{N}^{n}_{2d}$ , is the unique optimal solution of (4.5). Hence at an optimal solution $\mathbf{y}^{\star}$ , the necessary KKT-optimality conditions state that

[TABLE]

for some “dual variables” $\lambda^{\star}\in\mathbb{R}$ , $\Lambda_{j}\succcurlyeq 0$ , $j=0,\ldots,m$ . We also have the complementarity conditions

[TABLE]

Multiplying by $y^{\star}_{\delta,\alpha}$ , summing up and using the complementarity conditions (4.8) yields

[TABLE]

and so $\lambda^{\star}=\binom{n+d}{n}$ .

On the other hand, multiplying by $x^{\alpha}$ and summing up yields

[TABLE]

for some SOS polynomials $(\sigma_{j})\subset\mathbb{R}[x]$ , $j=0,\ldots,m$ . Let $p^{\star}_{d}\in\mathbb{R}[x]_{2d}$ be the Christoffel polynomial associated with $\mathbf{y}^{\star}_{d}$ . Since $\lambda^{\star}=\binom{n+d}{n}$ , (4.10) reads

[TABLE]

and (4.9) implies $L_{\mathbf{y}^{\star}_{d}}(p^{\star}_{d})=0$ .

∎

Hence, if the optimal solution $\mathbf{y}^{\star}_{d}$ of (4.5) is coming from a measure $\mu$ on $\mathcal{X}$ , that is $\mathbf{y}^{\star}_{d}\in\mathcal{M}_{2d}(\mathcal{X})$ , then $\rho_{\delta}=\rho$ and $\mathbf{y}^{\star}_{d}$ is the unique optimal solution of (4.1). In addition, by Theorem 4.2, $\mu$ can be chosen to be atomic and supported on at least $\binom{n+d}{n}$ and at most $\binom{n+2d}{n}$ “contact points” on the set $\{x\in\mathcal{X}:\binom{n+d}{n}-p^{\star}_{d}(x)=0\}$ .

4.4. Asymptotics

To analyze what happens when $\delta$ tends to infinity, we denote the optimal solution $\mathbf{y}_{d}^{\star}\in\mathcal{M}^{\mathsf{SDP}}_{2(d+\delta)}\subseteq\mathbb{R}^{\binom{n+2d}{n}}$ of (4.5) by $\mathbf{y}_{d,\delta}^{\star}$ to indicate that it is the subvector $\mathbf{y}_{d,\delta}^{\star}=(y_{\delta,\alpha}^{\star})_{|\alpha|\leq 2d}$ of a lifting $\mathbf{y}_{\delta}^{\star}\in\mathbb{R}^{s(2(d+\delta)}$ . Now, we examine the behavior of $(\mathbf{y}^{\star}_{d,\delta})_{\delta\in\mathbb{N}}$ as $\delta\to\infty$ .

Theorem 4.4.

For every $\delta=0,1,2,\ldots,$ let $\mathbf{y}^{\star}_{d,\delta}$ be an optimal solution to (4.5) and $p^{\star}_{d,\delta}\in\mathbb{R}[x]_{2d}$ the Christoffel polynomial associated with $\mathbf{y}^{\star}_{d}$ in Theorem 4.3. Then

a)

$\rho_{\delta}\to\rho$ * as $\delta\to\infty$ , where $\rho$ is the supremum in (4.1).* 2. b)

For every $\alpha\in\mathbb{N}^{n}$ with $|\alpha|\leq 2d$

[TABLE]

where $\mathbf{y}^{\star}=(y^{\star}_{\alpha})_{|\alpha|\leq 2d}\in\mathcal{M}_{2d}(\mathcal{X})$ is the unique optimal solution to (4.1). 3. c)

$p^{\star}_{d,\delta}\to p^{\star}_{d}$ * as $\delta\to\infty$ , where $p^{\star}_{d}$ is the Christoffel polynomial associated with $\mathbf{y}^{\star}$ defined in (4.2).* 4. d)

*If the dual polynomial $p^{\star}$ $($ given by (4.4) $)$ can be represented as a Sum-Of-Squares $($ namely, it satisfies (4.10) $)$ then $\mathbf{y}^{\star}_{d,\delta}$ is the unique optimal solution to (4.1) and $\mathbf{y}^{\star}_{d,\delta}$ has a representing measure * $($ namely the target measure $\zeta^{\star})$ .

Proof.

a)

For every $\delta$ complete the lifted finite sequence $\mathbf{y}^{\star}_{\delta}\in\mathbb{R}^{\binom{n+2(d+\delta)}{n}}$ with zeros to make it an infinite sequence $\mathbf{y}^{\star}_{\delta}=(y^{\star}_{\delta,\alpha})_{\alpha\in\mathbb{N}^{n}}$ . Therefore, every such $\mathbf{y}^{\star}_{\delta}$ can be identified with an element of $\ell_{\infty}$ , the Banach space of finite bounded sequences equipped with the supremum norm. Moreover, (4.6) holds for every $\mathbf{y}^{\star}_{\delta}$ . Thus, denoting by $\mathcal{B}$ the unit ball of $\ell_{\infty}$ which is compact in the $\sigma(\ell_{\infty},\ell_{1})$ weak- $\star$ topology on $\ell_{\infty}$ , we have $\mathbf{y}^{\star}_{\delta}\in\mathcal{B}$ . By Banach-Alaoglu’s theorem, there is an element $\hat{\mathbf{y}}\in\mathcal{B}$ and a converging subsequence $(\delta_{k})_{k\in\mathbb{N}}$ such that

[TABLE]

Let $s\in\mathbb{N}$ be arbitrary, but fixed. By the convergence (4.12) we also have

[TABLE]

Notice that the subvectors $\mathbf{y}^{\star}_{d,\delta}=(y_{d,\delta,\alpha}^{\star})_{|\alpha|\leq 2d}$ with $\delta=0,1,2,\ldots$ belong to a compact set. Therefore, since $\det(\mathbf{M}_{d}(\mathbf{y}^{\star}_{d,\delta}))>0$ for every $\delta$ , we also have $\det(\mathbf{M}_{d}(\hat{\mathbf{y}}))>0$ .

Next, by Putinar’s theorem [L10, Theorem 3.8], $\hat{\mathbf{y}}$ is the sequence of moments of some measure $\hat{\mu}\in\mathscr{M}_{+}(\mathcal{X})$ , and so $\hat{\mathbf{y}}_{d}=(\hat{y}_{\alpha})_{|\alpha|\leq 2d}$ is a feasible solution to (4.1), meaning $\rho\geq\log\det(\mathbf{M}_{d}(\hat{\mathbf{y}}_{d}))$ . On the other hand, as (4.5) is a relaxation of (4.1), we have $\rho\leq\rho_{\delta_{k}}$ for all $\delta_{k}$ . So the convergence (4.12) yields

[TABLE]

which proves that $\hat{\mathbf{y}}$ is an optimal solution to (4.1), and $\lim_{\delta\to\infty}\rho_{\delta}=\rho$ . 2. b)

As the optimal solution to (4.1) is unique, we have $\mathbf{y}^{\star}=\hat{\mathbf{y}}_{d}$ , and the whole sequence $(\mathbf{y}^{\star}_{d,\delta})_{\delta\in\mathbb{N}}$ converges to $\mathbf{y}^{\star}$ , that is

[TABLE] 3. c)

Finally, to show (c) it suffices to observe that the coefficients of the orthonormal polynomials $(P_{\alpha})_{|\alpha|\leq d}$ with respect to $\mathbf{y}^{\star}_{d}$ are continuous functions of the moments $(y^{\star}_{\delta,\alpha})_{|\alpha|\leq 2d}$ . Therefore, by the convergence (4.13) one has $p^{\star}_{d,\delta}\to p^{\star}_{d}$ where $p^{\star}_{d}\in\mathbb{R}[x]_{2d}$ as in Theorem 4.2. 4. d)

The last point is direct observing that, in this case, the two programs satisfies the same KKT conditions.

∎

5. Recovering the measure

By solving step one as explained in Section 4, we obtain a solution $\mathbf{y}^{\star}_{d}$ of SDP problem (4.5). However we do not know if $\mathbf{y}^{\star}_{d}$ comes from a measure. This would be the case if we can find an atomic measure having these moments and yielding the same value in problem (4.5). For this, we propose two approaches: A first one which follows a procedure by Nie [N14], and a second one which uses properties of the Christoffel polynomial associated with $\mathbf{y}^{\star}_{d}$ .

5.1. Via the method by Nie

This approach to recovering a measure from its moments is based on a formulation proposed by Nie in [N14].

Let $\mathbf{y}_{d}^{\star}=(y^{\star}_{\delta,\alpha})_{|\alpha|\leq 2d}$ be a solution to (4.5). For $r\in\mathbb{N}$ consider the SDP problem

[TABLE]

where $f_{r}\in\mathbb{R}[x]_{2(d+r)}$ is a randomly generated polynomial, strictly positive on $\mathcal{X}$ , and again $v_{j}=\lceil d_{j}/2\rceil$ , $j=1,\ldots,m$ . Then, we check whether the optimal solution $\mathbf{y}^{\star}_{r}$ of (5.1) satisfies the rank condition

[TABLE]

where $v:=\max_{j}v_{j}$ . If the test is passed, then we stop, otherwise we repeat with $r:=r+1$ . Using linear algebra, we can also extract the points $x_{1}^{\star},\dotsc,x_{r}^{\star}\in\mathcal{X}$ which are the support of the representing atomic measure of $\mathbf{y}_{d}^{\star}$ . If $\mathbf{y}^{\star}_{d}\in\mathcal{M}_{2d}(\mathcal{X})$ , then with probability one, the rank condition (5.2) will be satisfied for a sufficiently large value of $r$ .

Experience reveals that in most of the cases it is enough to use the following polynomial

[TABLE]

instead of using a random positive polynomial on $\mathcal{X}$ . In problem (5.1) this corresponds to minimizing the trace of $\mathbf{M}_{d+r}(\mathbf{y})$ (and so induces an optimal solution $\mathbf{y}$ with low rank matrix $\mathbf{M}_{d+r}(\mathbf{y})$ ).

5.2. Via Christoffel polynomials

Another possibility to recover the atomic representing measure of $\mathbf{y}_{d}^{\star}$ is to find the zeros of the polynomial $p^{\star}(x)=\binom{n+d}{n}-p^{\star}_{d}(x)$ , where $p^{\star}_{d}$ is the Christoffel polynomial associated with $\mathbf{y}_{d}^{\star}$ on $\mathcal{X}$ , that i, the set $\{x\in\mathcal{X}:\binom{n+d}{n}-p_{d}^{\star}(x)=0\}$ . Due to Theorem 4.3 this set is the support of the atomic representing measure.

So we minimize the polynomial $p^{\star}$ on $\mathcal{X}$ and check whether it vanishes on at least $\binom{n+d}{n}$ (and at most $\binom{n+2d}{n}$ ) points of the boundary of $\mathcal{X}$ . That is, let $p^{\star}_{d}$ be as in Theorem 4.3 for some fixed $\delta\in\mathbb{N}$ and solve the SDP problem

[TABLE]

Since $p^{\star}_{d}$ is associated with the optimal solution to (4.5) for some given $\delta\in\mathbb{N}$ , by Theorem 4.3, it satisfies the Putinar certificate (4.11) of positivity on $\mathcal{X}$ . Thus, the value of problem 5.3 is zero for all $r\geq\delta$ . Therefore, for every feasible solution $\mathbf{y}_{r}$ of (5.3) one has $L_{\mathbf{y}_{r}}(p^{\star})\geq 0$ (and $L_{\mathbf{y}^{\star}_{d}}(p^{\star})=0$ for $\mathbf{y}^{\star}_{d}$ an optimal solution of (4.5)).

Alternatively, we can solve the SDP

[TABLE]

We know that the value of Problem 5.4 is not greater than ${\rm trace}(\mathbf{M}_{d}(\hat{\mathbf{y}}^{\star}))$ where $\hat{\mathbf{y}}^{\star}$ is an optimal solution to (5.1) for $r=\delta$ , because $\hat{\mathbf{y}}^{\star}$ is feasible for (5.4).

5.3. Calculating the corresponding weights

After recovering the support $\{x_{1},\dotsc,x_{\ell}\}$ of the atomic representing measure by one of the previously presented methods, we might be interested in also computing the corresponding weights $\omega_{1},\dotsc,\omega_{\ell}$ . These can be calculated easily by solving the following linear system of equations: $\sum_{i=1}^{\ell}\omega_{i}x_{i}^{\alpha}=y_{d,\alpha}^{\star}$ for all $|\alpha|\leq d$ , i.e. $\int_{\mathcal{X}}x^{\alpha}\mu^{\star}(dx)=y^{\star}_{d,\alpha}$ .

6. Examples

We illustrate the procedure on three examples: a univariate one, a polygon in the plane and one example on the three-dimensional sphere.

All examples are modeled by GloptiPoly 3 [HLL09] and YALMIP [L04] and solved by MOSEK 7 [MOS] or SeDuMi under the MATLAB R2014a environment. We ran the experiments on an HP EliteBook with 16-GB RAM memory and an Intel Core i5-4300U processor. We do not report computation times, since they are negligible for our small examples.

6.1. Univariate unit interval

We consider as design space the interval $\mathcal{X}=[-1,1]$ and on it the polynomial measurements $\sum_{j=0}^{d}\theta_{j}x^{j}$ with unknown parameters $\theta\in\mathbb{R}^{d+1}$ . To find the optimal design we first solve problem (4.5), in other words

[TABLE]

for given $d$ and $\delta$ . For instance, for $d=5$ and $\delta=0$ we obtain the sequence $\mathbf{y}_{d}^{\star}\approx\,$ (1, 0, 0.56, 0, 0.45, 0, 0.40, 0, 0.37, 0, 0.36).

Then, to recover the corresponding atomic measure from the sequence $\mathbf{y}^{\star}_{d}$ we solve the problem

[TABLE]

and find the points -1, -0.765, -0.285, 0.285, 0.765 and 1 (for $d=5$ , $\delta$ =0, $r=1$ ). As a result, our optimal design is the Dirac measure supported on these points. These match with the known analytic solution to the problem, which are the critical points of the Legendre polynomial, see e.g. [DS97, Theorem 5.5.3, p.162]. Calculating the corresponding weights as described in Section 5.3, we find $\omega_{1}=\dotsb=\omega_{6}\approx 0.166$ .

Alternatively, we compute the roots of the polynomial $x\mapsto p^{\star}(x)=6-p^{\star}_{5}(x)$ , where $p^{\star}_{5}$ is the Christoffel polynomial of degree $2d=10$ on $\mathcal{X}$ and find the same points as in the previous approach by solving problem (5.4). See Figure 1 for the graph of the Christoffel polynomial of degree 10.

We observe that we get less points when using problem (5.3) to recover the support for this example. This may occur due to numerical issues.

6.2. Wynn’s polygon

As a two-dimensional example we take the polygon given by the vertices $(-1,-1),\ (-1,1),\ (1,-1)$ and $(2,2)$ , scaled to fit the unit circle, i.e. we consider the design space

[TABLE]

Remark that we need the redundant constraint $x_{1}^{2}+x_{2}^{2}\leq 1$ in order to have an algebraic certificate of compactness.

As before, to find the optimal measure for the regression we solve the problems (4.5) and (5.1). Let us start by analyzing the results for $d=1$ and $\delta=3$ . Solving (4.5) we obtain $\mathbf{y}^{\star}\in\mathbb{R}^{45}$ which leads to 4 atoms when solving (5.1) with $r=3$ . For the latter the moment matrices of order 2 and 3 both have rank 4, so condition (5.2) is fulfilled. As expected, the 4 atoms are exactly the vertices of the polygon.

Again, we could also solve problem (5.4) instead of (5.1) to receive the same atoms. As in the univariate example we get less points when using problem (5.3). To be precise, GloptiPoly is not able to extract any solutions for this example.

When increasing $d$ , we get an optimal measure with a larger support. For $d=2$ we recover 7 points, and 13 for $d=3$ . See Figure 2 for the polygon, the supporting points of the optimal measure and the $\binom{2+d}{2}$ -level set of the Christoffel polynomial $p^{\star}_{d}$ for different $d$ . The latter demonstrates graphically that the set of zeros of $\binom{2+d}{d}-p^{\star}_{d}$ intersected with $\mathcal{X}$ are indeed the atoms of our representing measure. In Figure 3 we visualized the weights corresponding to each point of the support for the different $d$ .

6.3. The 3-dimensional unit sphere

Last, let us consider the regression for the degree $d$ polynomial measurements $\sum_{|\alpha|\leq d}\theta_{\alpha}x^{\alpha}$ on the unit sphere $\mathcal{X}=\{x\in\mathbb{R}^{3}:x_{1}^{2}+x_{2}^{2}+x_{3}^{2}=1\}$ . As before, we first solve problem (4.5). For $d=1$ and $\delta\geq 0$ we obtain the sequence $\mathbf{y}^{\star}_{d}\in\mathbb{R}^{10}$ with $y_{000}^{\star}=1,\ y_{200}^{\star}=y_{020}^{\star}=y_{002}^{\star}=0.333$ and all other entries zero.

In the second step we solve problem (5.1) to recover the measure. For $r=2$ the moment matrices of order 2 and 3 both have rank 6, meaning the rank condition (5.2) is fulfilled, and we obtain the six atoms $\{(\pm 1,0,0),(0,\pm 1,0),(0,0,\pm 1)\}\subseteq\mathcal{X}$ on which the optimal measure $\mu\in\mathscr{M}_{+}(\mathcal{X})$ is uniformly supported.

For quadratic regressions, i.e. $d=2$ , we obtain an optimal measure supported on 14 atoms evenly distributed on the sphere. Choosing $d=3$ , meaning cubic regressions, we find a Dirac measure supported on 26 points which again are evenly distributed on the sphere. See Figure 4 for an illustration of the supporting points of the optimal measures for $d=1$ , $d=2$ , $d=3$ and $\delta=0$ .

Using the method via Christoffel polynomials gives again less points. No solution is extracted when solving problem (5.4) and we find only two supporting points for problem (5.3).

Acknowledgments

We thank Henry Wynn for communicating the polygon of Example 6.2 to us. Feedback from Luc Pronzato, Lieven Vandenberghe and Weng Kee Wong was also appreciated.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[ADT 07] A. C. Atkinson, A. N. Donev, R. D. Tobias, Optimum experimental designs, with SAS , Oxford Statistical Science Series, 34, Oxford University Press, Oxford, 2007.
2[B 08] R. A. Bailey, Design of comparative experiments , Cambridge Series in Statistical and Probabilistic Mathematics, 25, Cambridge University Press, Cambridge, 2008
3[BHH 78] G. E. P. Box, W. G. Hunter, J. S. Hunter, Statistics for experimenters: an introduction to design, data analysis, and model building , Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York-Chichester-Brisbane, 1978.
4[DG 14] H. Dette, Y. Grigoriev, E-optimal designs for second-order response surface models , The Annals of Statistics, 42(4):1635-1656, 2014.
5[DS 93] H. Dette, W. J. Studden, Geometry of E-optimality , Ann. Statist. 21(1):416-433, 1993.
6[DS 97] H. Dette, W. J. Studden, The theory of canonical moments with applications in statistics, probability, and analysis , Wiley Series in Probability and Statistics: Applied Probability and Statistics. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1997.
7[E 52] G. Elfving, Optimum allocation in linear regression theory , Ann. Math. Statistics 23:255–262, 1952.
8[F 10] V. Fedorov, Optimal experimental design , Wiley Interdisciplinary Reviews: Computational Statistics, 2(5):581-589, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

D-optimal design for multivariate polynomial regression via the Christoffel function and semidefinite relaxations

Abstract.

Key words and phrases:

1. Introduction

1.1. Convex design theory

1.2. State of the art

1.3. Contribution

1.4. Outline of the paper

2. Polynomial optimal design and moments

2.1. Polynomial optimal design

2.2. Moments, the moment cone and the moment matrix

2.3. Approximations of the moment cone

3. Approximate Optimal Design

3.1. Problem reformulation in the multivariate polynomial case

3.2. Relaxing the integrality constraints

3.3. Moment formulation

4. The ideal problem on moments and its approximation

4.1. Christoffel polynomials

Definition 4.1**.**

4.2. The ideal problem on moments

Theorem 4.2**.**

Proof.

4.3. The SDP approximation scheme

Theorem 4.3**.**

Proof.

4.4. Asymptotics

Theorem 4.4**.**

Proof.

5. Recovering the measure

5.1. Via the method by Nie

5.2. Via Christoffel polynomials

5.3. Calculating the corresponding weights

6. Examples

6.1. Univariate unit interval

6.2. Wynn’s polygon

6.3. The 3-dimensional unit sphere

Acknowledgments

Definition 4.1.

Theorem 4.2.

Theorem 4.3.

Theorem 4.4.