Block-diagonal covariance estimation and application to the Shapley   effects in sensitivity analysis

Baptiste Broto (LADIS); Fran\c{c}ois Bachoc (IMT); Laura Clouvel,; Jean-Marc Martinez (DM2S)

arXiv:1907.12780·math.ST·February 14, 2020·SIAM/ASA J. Uncertain. Quantification

Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis

Baptiste Broto (LADIS), Fran\c{c}ois Bachoc (IMT), Laura Clouvel,, Jean-Marc Martinez (DM2S)

PDF

TL;DR

This paper develops consistent estimators for block-diagonal covariance matrices in high-dimensional Gaussian data and applies them to efficiently estimate Shapley effects in sensitivity analysis, even with thousands of variables.

Contribution

It introduces new estimators for block-diagonal covariance matrices that are both consistent and efficient, enabling scalable sensitivity analysis in high dimensions.

Findings

01

Estimator converges at the same rate as if the true structure was known.

02

Estimator is asymptotically efficient in fixed dimension.

03

Allows estimation of Shapley effects for thousands of variables.

Abstract

In this paper, we aim to estimate block-diagonal covariance matrices for Gaussian data in high dimension and in fixed dimension. We first estimate the block-diagonal structure of the covariance matrix by theoretical and practical estimators which are consistent. We deduce that the suggested estimator of the covariance matrix in high dimension converges with the same rate than if the true decomposition was known. In fixed dimension , we prove that the suggested estimator is asymptotically efficient. Then, we focus on the estimation of sensitivity indices called "Shapley effects", in the high-dimensional Gaussian linear framework. From the estimated covariance matrix, we obtain an estimator of the Shapley effects with a relative error which goes to zero at the parametric rate up to a logarithm factor. Using the block-diagonal structure of the estimated covariance matrix, this estimator is…

Equations421

(\Gamma_{B})_{i,j}=\left\{\begin{array}[]{ll}\gamma_{ij}&\text{if }(i,j)\in B\\ 0&\text{otherwise.}\end{array}\right.

(\Gamma_{B})_{i,j}=\left\{\begin{array}[]{ll}\gamma_{ij}&\text{if }(i,j)\in B\\ 0&\text{otherwise.}\end{array}\right.

S_{p}^{++} (R, B) := {Γ \in S_{p}^{++} (R) ∣ Γ = Γ_{B}, and \forall B^{'} < B, Γ \neq = Γ_{B^{'}}},

S_{p}^{++} (R, B) := {Γ \in S_{p}^{++} (R) ∣ Γ = Γ_{B}, and \forall B^{'} < B, Γ \neq = Γ_{B^{'}}},

\overline{X}_{n} := \frac{1}{n} l = 1 \sum n X^{(l)},

\overline{X}_{n} := \frac{1}{n} l = 1 \sum n X^{(l)},

S_{n} := \frac{1}{n} l = 1 \sum n (X^{(l)} - \overline{X}_{n}) (X^{(l)} - \overline{X}_{n})^{T},

S_{n} := \frac{1}{n} l = 1 \sum n (X^{(l)} - \overline{X}_{n}) (X^{(l)} - \overline{X}_{n})^{T},

L_{Γ, m} (X^{(1)}, ..., X^{(n)}) := \frac{1}{( 2 π ) ^{\frac{n}{2}} ∣Γ ∣ ^{\frac{1}{2}}} exp (- \frac{1}{2} l = 1 \sum n (X^{(l)} - m)^{T} Γ^{- 1} (X^{(l)} - m)),

L_{Γ, m} (X^{(1)}, ..., X^{(n)}) := \frac{1}{( 2 π ) ^{\frac{n}{2}} ∣Γ ∣ ^{\frac{1}{2}}} exp (- \frac{1}{2} l = 1 \sum n (X^{(l)} - m)^{T} Γ^{- 1} (X^{(l)} - m)),

l_{Γ} := - \frac{2}{p} lo g (L_{Γ, \overline{X}} (X^{(1)}, ..., X^{(n)})) - \frac{n}{p} lo g (2 π) = \frac{1}{p} (lo g ∣Γ∣ + Tr (Γ^{- 1} S)) .

l_{Γ} := - \frac{2}{p} lo g (L_{Γ, \overline{X}} (X^{(1)}, ..., X^{(n)})) - \frac{n}{p} lo g (2 π) = \frac{1}{p} (lo g ∣Γ∣ + Tr (Γ^{- 1} S)) .

pen (Γ) := pen (B (Γ)) := i = 1 \sum K p_{k}^{2},

pen (Γ) := pen (B (Γ)) := i = 1 \sum K p_{k}^{2},

\Phi:\begin{array}[]{ccc}S_{p}^{++}(\mathbb{R})&\longrightarrow&\mathbb{R}\\ \Gamma&\longmapsto&l_{\Gamma}+\kappa\operatorname{pen}(\Gamma),\end{array}

\Phi:\begin{array}[]{ccc}S_{p}^{++}(\mathbb{R})&\longrightarrow&\mathbb{R}\\ \Gamma&\longmapsto&l_{\Gamma}+\kappa\operatorname{pen}(\Gamma),\end{array}

B_{t o t} := B \in P_{p} arg min Ψ (B),

B_{t o t} := B \in P_{p} arg min Ψ (B),

P (B_{t o t} = B^{*}) ⟶ 1.

P (B_{t o t} = B^{*}) ⟶ 1.

P (B (α_{1}) \neq > B_{t o t} \leq B (α_{2})) ⟶ 1.

P (B (α_{1}) \neq > B_{t o t} \leq B (α_{2})) ⟶ 1.

P (B = B^{*}) ⟶ 1.

P (B = B^{*}) ⟶ 1.

P (B (α_{1}) \leq B_{n^{- δ /2}} \leq B (α_{2})) ⟶ 1.

P (B (α_{1}) \leq B_{n^{- δ /2}} \leq B (α_{2})) ⟶ 1.

\frac{1}{p} ∥ S_{B^{*}} - Σ ∥_{F}^{2} = O_{p} (1/ n)

\frac{1}{p} ∥ S_{B^{*}} - Σ ∥_{F}^{2} = O_{p} (1/ n)

\frac{1}{p} ∥ S_{B} - Σ ∥_{F}^{2} = O_{p} (1/ n) .

\frac{1}{p} ∥ S_{B} - Σ ∥_{F}^{2} = O_{p} (1/ n) .

\frac{1}{p} ∥ S_{B^{*}} - Σ ∥_{F}^{2} \neq = o_{p} (1/ n) .

\frac{1}{p} ∥ S_{B^{*}} - Σ ∥_{F}^{2} \neq = o_{p} (1/ n) .

\frac{1}{p} ∥ S - Σ ∥_{F}^{2} = O_{p} (p / n) .

\frac{1}{p} ∥ S - Σ ∥_{F}^{2} = O_{p} (p / n) .

E (\frac{1}{p} ∥ S - Σ ∥_{F}^{2}) \geq \frac{λ _{i n f}^{2} p}{2 n} .

E (\frac{1}{p} ∥ S - Σ ∥_{F}^{2}) \geq \frac{λ _{i n f}^{2} p}{2 n} .

\frac{1}{p} ∥ S_{B_{n^{- δ /2}}} - Σ ∥_{F}^{2} = o_{p} (\frac{1}{n ^{δ - ε}}) .

\frac{1}{p} ∥ S_{B_{n^{- δ /2}}} - Σ ∥_{F}^{2} = o_{p} (\frac{1}{n ^{δ - ε}}) .

P (\exists B < B^{*}, ∥ Σ_{B} - Σ ∥_{m a x} < a n^{- \frac{1}{4}}) ⟶ 0.

P (\exists B < B^{*}, ∥ Σ_{B} - Σ ∥_{m a x} < a n^{- \frac{1}{4}}) ⟶ 0.

P (B_{t o t} = B^{*}) ⟶ 1.

P (B_{t o t} = B^{*}) ⟶ 1.

P (B_{C} = B^{*}) ⟶ 1.

P (B_{C} = B^{*}) ⟶ 1.

E [(θ - θ) (θ - θ)^{T}] \leq U (U^{T} J U)^{- 1} U^{T},

E [(θ - θ) (θ - θ)^{T}] \leq U (U^{T} J U)^{- 1} U^{T},

n (vec (S_{B}) - vec (Σ)) n \to + \infty ⟶ L N (0, CR (Σ, B^{*})),

n (vec (S_{B}) - vec (Σ)) n \to + \infty ⟶ L N (0, CR (Σ, B^{*})),

η_{i} := \frac{1}{p Var ( Y )} u \subset - i \sum (p - 1 ∣ u ∣)^{- 1} (Var (E (Y ∣ X_{u \cup {i}})) - Var (E (Y ∣ X_{u})))

η_{i} := \frac{1}{p Var ( Y )} u \subset - i \sum (p - 1 ∣ u ∣)^{- 1} (Var (E (Y ∣ X_{u \cup {i}})) - Var (E (Y ∣ X_{u})))

η_{i} := \frac{1}{p Var ( Y )} u \subset - i \sum (p - 1 ∣ u ∣)^{- 1} (Var (Y ∣ X_{u}) - Var (Y ∣ X_{u \cup {i}})) .

η_{i} := \frac{1}{p Var ( Y )} u \subset - i \sum (p - 1 ∣ u ∣)^{- 1} (Var (Y ∣ X_{u}) - Var (Y ∣ X_{u \cup {i}})) .

Var (Y ∣ X_{u}) = Var (β_{- u}^{T} X_{- u} ∣ X_{u}) = β_{- u}^{T} (Σ_{- u, - u} - Σ_{- u, u} Σ_{u, u}^{- 1} Σ_{u, - u}) β_{- u}

Var (Y ∣ X_{u}) = Var (β_{- u}^{T} X_{- u} ∣ X_{u}) = β_{- u}^{T} (Σ_{- u, - u} - Σ_{- u, u} Σ_{u, u}^{- 1} Σ_{u, - u}) β_{- u}

η_{i} = \frac{1}{β ^{T} Σ β} \frac{1}{∣ B _{[i]}^{*} ∣} u \subset B_{[i]}^{*} - i \sum (∣ B_{[i]}^{*} ∣ - 1 ∣ u ∣)^{- 1} (V_{u}^{B_{[i]}^{*}} - V_{u \cup {i}}^{B_{[i]}^{*}}),

η_{i} = \frac{1}{β ^{T} Σ β} \frac{1}{∣ B _{[i]}^{*} ∣} u \subset B_{[i]}^{*} - i \sum (∣ B_{[i]}^{*} ∣ - 1 ∣ u ∣)^{- 1} (V_{u}^{B_{[i]}^{*}} - V_{u \cup {i}}^{B_{[i]}^{*}}),

V_{v}^{B_{[i]}^{*}} := Var (β_{B_{[i]}^{*}}^{T} X_{B_{[i]}^{*}} ∣ X_{v}) = β_{B_{[i]}^{*} - v}^{T} (Σ_{B_{[i]}^{*} - v, B_{[i]}^{*} - v} - Σ_{B_{[i]}^{*} - v, v} Σ_{v, v}^{- 1} Σ_{v, B_{[i]}^{*} - v}) β_{B_{[i]}^{*} - v}

V_{v}^{B_{[i]}^{*}} := Var (β_{B_{[i]}^{*}}^{T} X_{B_{[i]}^{*}} ∣ X_{v}) = β_{B_{[i]}^{*} - v}^{T} (Σ_{B_{[i]}^{*} - v, B_{[i]}^{*} - v} - Σ_{B_{[i]}^{*} - v, v} Σ_{v, v}^{- 1} Σ_{v, B_{[i]}^{*} - v}) β_{B_{[i]}^{*} - v}

\tilde{Y}^{(l)} = β_{0} + β^{T} X^{(l)} + ε^{(l)},

\tilde{Y}^{(l)} = β_{0} + β^{T} X^{(l)} + ε^{(l)},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis

Baptiste Broto

CEA, LIST, Université Paris-Saclay, F-91120, Palaiseau, France

François Bachoc

Institut de Mathématiques de Toulouse, Université Paul Sabatier, F-31062 Toulouse, France

Laura Clouvel

CEA, SERMA, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France

Jean-Marc Martinez

CEA, DEN-STMF, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France

(March 1, 2024)

Abstract

In this paper, we estimate a block-diagonal covariance matrix from Gaussian variables in high dimension. We prove that, under some mild assumptions, we find the block-diagonal structure of the matrix with probability that goes to one. We deduce estimators of the covariance matrix that are as accurate as if the block-diagonal structure where known, with numerical applications. We also prove the asymptotic efficiency of one of these estimators in fixed dimension. Then, we apply these estimators for the computation of sensitivity indices, namely the Shapley effects, in the Gaussian linear framework. We derive estimators of the Shapley effects in high dimension with a relative error that converges to 0 at the parametric rate, up to a logarithm factor. Finally, we apply the Shapley effects estimators on nuclear data.

1 Introduction

Sensitivity analysis, and particularly sensitivity indices, have became important tools in applied sciences. The aim of sensitivity indices is to quantify the impact of the input variables $X_{1},...,X_{p}$ on the output $Y$ of a model. This information improves the interpretability of the model. In global sensitivity analysis, the input variables are assumed to be random variables. In this framework, the Sobol indices [33] were the first suggested indices to be applicable to general classes of models. Nevertheless, one of the most important limitations of these indices is the assumption of independence between the input variables. Hence, many variants of the Sobol indices have been suggested for dependent input variables [22, 6, 23, 7].

Recently, Owen defined new sensitivity indices in [26] called "Shapley effects". These sensitivity indices have many advantages over the Sobol indices for dependent inputs [16]. For general models, [34] suggested an estimator of the Shapley effects. However, this estimation requires to be able to generate samples with the conditional distributions of the input variables. A consistent estimator has been suggested in [3], requiring only a sample of the inputs-output. This estimator uses nearest-neighbours methods to mimic the generation of samples with these conditional distributions.

In this paper, we focus on Gaussian linear models in large dimension. Gaussian linear models are widely used as numerical models of physical phenomena (see for example [19, 13, 29]). Indeed, uncertainties are often modelled as Gaussian variables and an unknown function $Y=f(X_{1},...,X_{p})$ is commonly approximated by its linear approximation around $\mathbb{E}(X)$ . Furthermore, high-dimensional Gaussian linear models are widely studied in statistics [5, 11]. In this particular case of Gaussian linear models, the theoretical values of the Shapley effects can be computed explicitly [27, 16, 4]. These values depend on the covariance matrix $\Sigma$ of the inputs and on the coefficients $\beta$ of the linear model.

In this paper, we assume that we observe an i.i.d. sample of the input Gaussian variables in high dimension and that the true covariance matrix $\Sigma$ and the vector $\beta$ are unknown. In this setting, the Shapley effects need to be estimated, replacing the true vector $\beta$ by its estimation and the theoretical covariance matrix $\Sigma$ by an estimated covariance matrix.

There exists a fair amount of work on high-dimensional covariance matrix estimation. Many researchers took an interest in the empirical covariance matrix in high dimension [24, 37, 32, 1]. For particular covariance matrices, different estimators than the empirical covariance can be preferred. For some well-conditioned families of covariance matrices, [2] suggests a banded version of the empirical covariance matrix, and several works address the problem of estimating a sparse covariance matrix [14, 20, 10].

However, in general, given a high-dimensional covariance matrix, the computation cost of the corresponding Shapley effects grows exponentially with the dimension. The only setting where a procedure to compute the Shapley effect with a non-exponential cost is the setting of block-diagonal matrices [4]. Hence, in high dimension, block-diagonal covariance matrices are a very favorable setting for the estimation of the Shapley effects. Thus, we address the estimation of high-dimensional block-diagonal covariance matrices in this paper. In contrasts, we remark that the above methods are not relevant for the estimation of the Shapley effects, since they do not provide block-diagonal matrices.

In our framework, we assume that the true covariance matrix is block-diagonal and we want to estimate this matrix with a similar structure to compute the deduced Shapley effects. Some works address the block-diagonal estimation of covariance matrices. [28] gives a numerical procedure to estimate such covariance matrices and [15] suggests a test to verify the independence of the blocks. A block-diagonal estimator of the covariance matrix is proposed in [9]. The authors of [9] choose a more general framework, without assuming that the true covariance matrix is block-diagonal. They obtain the estimated block-diagonal structure by thresholding the empirical correlation matrix. They also give theoretical guaranties by bounding the average of the squared Hellinger distance between the estimated probability density function and the true one. This bound depends on the dimension $p$ and the sample size $n$ . When $p/n$ converges to some constant $y\in]0,1[$ , this bound is larger than $1$ and is no longer relevant as the Hellinger distance is always smaller than $1$ .

Here, we focus on the high dimension setting, when $p/n$ converges to some constant $y\in]0,1[$ , and when the true covariance matrix is assumed to be block-diagonal. We give different estimators of the block-diagonal structure and we show that their complexity is small. Then, we provide new asymptotic results for these estimators. Under mild conditions, we show that the estimators of the block structure are equal to the true block structure, with probability converging to one. Furthermore, the square Frobenius distance between the estimated covariance matrices and the true one, normalized by $p$ , converge to zero at rate $1/n$ . Thus, our work complements the one of [9]. We also study the fixed-dimensional setting, where we show that one of our suggested estimators is asymptotically efficient.

From the estimated block-diagonal covariance matrices, we deduce estimators of the Shapley effects in the high-dimensional linear Gaussian framework, with reduced computational cost. We recall that in high dimension, the computation of the Shapley effects requires that the corresponding covariance matrix be block-diagonal. We show that the relative estimation error of these estimators goes to zero at the parametric rate $1/n^{1/2}$ , up to a logarithm factor, even if the linear model is estimated from noisy observations.

Our convergence results are confirmed by numerical experiments. We also apply our algorithm to semi-generated data from nuclear applications.

The rest of the paper is organized as follows. In Section 2, we focus on the block-diagonal estimation of the block-diagonal covariance matrix. In Section 3, we apply this block-diagonal estimation of the covariance matrix to deduce Shapley effects estimators. Section 4 is devoted to the numerical application on nuclear data, and the conclusion is given in Section 5. All the proofs are postponed to the appendix.

2 Estimation of block-diagonal covariance matrices

2.1 Problem and notation

We assume that we observe $(X^{(l)})_{l\in[1:n]}$ , an i.i.d. sample with distribution $\mathcal{N}(\mu,\Sigma)$ , where $\mu\in\mathbb{R}^{p}$ and $\Sigma$ are not known. Here, $[1:n]$ denotes the set of the integers from 1 to $n$ . We assume that $\Sigma=(\sigma_{ij})_{i,j\in[1:p]}\in S_{p}^{++}(\mathbb{R})$ (the set of the symmetric positive definite matrices) and has a block-diagonal decomposition. To be more precise on this block-diagonal decomposition, we need to introduce some notation.

Let us write $\mathcal{P}_{p}$ the set of all the partitions of $[1:p]$ . We endow the set $\mathcal{P}_{p}$ with the following partial order. If $B,B^{\prime}\in\mathcal{P}_{p}$ , we say that $B$ is finer than $B^{\prime}$ , and we write $B\leq B^{\prime}$ , if for all $A\in B^{\prime}$ , there exists $A_{1},...,A_{i}\in B$ such that $A=\bigsqcup_{j=1}^{i}A_{j}$ . We also compare the elements of a partition $B\in\mathcal{P}_{p}$ with their smallest element; that enables us to talk about "the $k$ -th element" of $B$ . If $B\in\mathcal{P}_{p}$ and $a_{1},...,a_{i}\in[1:p]$ , we write $(a_{1},...,a_{i})\in B$ if there exists $A\in B$ such that $\{a_{1},...,a_{i}\}\subset A$ (in other words, if $a_{1},...,a_{i}$ are in the same group of $B$ ). If $\Gamma\in S_{p}^{++}(\mathbb{R})$ with $\Gamma=(\gamma_{ij})_{i,j\in[1:p]}$ and if $B\in\mathcal{P}_{p}$ , we define $\Gamma_{B}$ by

[TABLE]

Let us define

[TABLE]

where we define $B^{\prime}<B$ if $B^{\prime}\leq B$ and if $B^{\prime}\neq B$ . Thus $S_{p}^{++}(\mathbb{R})=\bigsqcup_{B\in\mathcal{P}_{p}}S_{p}^{++}(\mathbb{R},B)$ and for all $\Gamma\in S_{p}^{++}(\mathbb{R})$ , we can define an unique $B(\Gamma)\in\mathcal{P}_{p}$ such that $\Gamma\in S_{p}^{++}(\mathbb{R},B(\Gamma))$ . Here, we assume that $\Sigma\in S_{p}^{++}(\mathbb{R},B^{*})$ , i.e. $B^{*}$ is the finest decomposition of $\Sigma$ , i.e. $B(\Sigma)=B^{*}$ . We say that $\Sigma$ has a block-diagonal decomposition $B^{*}$ .

We also write

[TABLE]

and

[TABLE]

which are the empirical estimators of $\mu$ and $\Sigma$ . To simplify notation, we write $\overline{X}$ for $\overline{X}_{n}$ and $S$ for $S_{n}$ (the dependency on $n$ is implicit). We know that, for all $\Gamma\in S_{p}^{++}(\mathbb{R})$ , $\overline{X}$ maximizes the likelihood $L_{\Gamma,m}(X^{(1)},...,X^{(n)})$ over the mean parameter $m$ , where

[TABLE]

and $|\Gamma|$ is the determinant of $\Gamma$ . Thus, for all $\Gamma\in S_{p}^{++}(\mathbb{R})$ , we define

[TABLE]

As we assume that the true covariance matrix is block-diagonal, we consider a block-diagonal promoting penalization of the form

[TABLE]

if $B(\Gamma)=\{B_{1},...,B_{K}\}$ and $|B_{k}|=p_{k}$ for all $k\in[1:K]$ . We consider the penalized likelihood criterion

[TABLE]

where $\kappa\geq 0$ . In this work, we suggest to estimate $\Sigma$ by the minimizer of $\Phi$ , for some choice of penalisation $\kappa$ . First, we show in Proposition 1 that a minimizer of $\Phi$ can only be a block-diagonal decomposition of $S$ .

Proposition 1.

If $\Gamma$ is a minimizer of $\Phi$ , then, there exists $B\in\mathcal{P}_{p}$ such that $\Gamma=S_{B}$ .

Hence, the minimization problem on $S_{p}^{++}(\mathbb{R})$ becomes a minimization problem on the finite set $\{S_{B},\;B\in\mathcal{P}_{p}\}$ . So, we define $\Psi(B):=\Phi(S_{B})$ and we suggest to estimate $B^{*}$ by

[TABLE]

as the minimum structure of the penalized log-likelihood. In this paper, we study theoretically this estimator of $B^{*}$ . However, it is unimplementable in high dimension since the number of partitions $B\in\mathcal{P}_{p}$ is too large. Hence, we will also define other estimators less costly, and study them theoretically.

2.2 Convergence in high dimension

2.2.1 Assumptions

In Section 2.2, we assume that $p$ and $n$ go to infinity. The true covariance matrix $\Sigma$ is not constant and depends on $n$ (or $p$ ). Nevertheless, to simplify notation, we do not write the dependency on $n$ . In all Section 2.2, we choose a penalisation coefficient $\kappa=\frac{1}{pn^{\delta}}$ for a fixed $\delta\in]1/2,1[$ .

We also add the following assumptions on $\Sigma$ along Section 2.2.

Condition 1.

$p/n\longrightarrow y\in]0,1[$ .

Condition 2.

There exist $\lambda_{\inf}>0$ and $\lambda_{\sup}<+\infty$ such that, for all $n$ , the eigenvalues of $\Sigma$ are in $[\lambda_{\inf},\lambda_{\sup}]$ .

Condition 3.

There exists $m\in\mathbb{N}^{*}$ such that for all $n$ , all the blocks of $\Sigma$ are smaller than $m$ , i.e. $\forall A\in B^{*}$ , we have $|A|\leq m$ .

For a $q\times q$ matrix $M=(m_{ij})_{(i,j)\in[1:q]^{2}}$ , we let $\|M\|_{\max}=\max_{(i,j)\in[1:q]^{2}}|m_{ij}|$ .

Condition 4.

There exists $a>0$ such that for all $n$ and for all $B<B^{*}$ , we have $\|\Sigma_{B}-\Sigma\|_{\max}\geq an^{-1/4}$ .

These four mild assumptions are discussed in Section 2.2.4. However, we also focus on the case when Condition 4 does not hold. We will provide similar results, both when assuming Conditions 1 to 4, and when only Conditions 1, 2 and 3 hold.

2.2.2 Convergence of $\widehat{B}$ and reduction of the cost

Now that we defined our estimator $\widehat{B}_{tot}$ of the true decomposition $B^{*}$ in Equation (1) and we added assumptions in Section 2.2.1, we give the convergence of $\widehat{B}$ in Proposition 2. Although $\widehat{B}_{tot}$ is not computable in practice, its convergence remains interesting to strengthen the choice of the penalized likelihood criterion and will be useful to prove the convergence of more practical estimators. In Section 2.2, all the limits statements are given as $n,p\to+\infty$ .

Proposition 2.

Under Conditions 1 to 4 and for a fixed $\delta\in]1/2,1[$ , we have

[TABLE]

Hence, under Conditions 1 to 4, the estimator $\widehat{B}_{tot}$ is equal to the true decomposition $B^{*}$ with probability which goes to one. When Condition 4 does not hold, we can not state such a convergence result but we get a weaker result in Proposition 3. In this case, we need to define $B({\alpha})$ as the partition given by thresholding $\Sigma$ by $n^{-\alpha}$ . In other words, $B({\alpha})$ is the smallest (or finest) partition $B$ such that $\|\Sigma_{B}-\Sigma\|_{\max}\leq n^{-\alpha}$ .

Proposition 3.

Under Conditions 1, 2 and 3, for all $\alpha_{1}<\delta/2$ and $\alpha_{2}>\delta/2$ , we have

[TABLE]

Thus, we defined a consistent estimator of $B^{*}$ that theoretically solves our problem of the lack of knowledge of the true decomposition $B^{*}$ . However, computing $\widehat{B}_{tot}$ is very costly in practice. Indeed, the number of partitions of $[1:p]$ (the Bell number) is exponential in $p$ . As in [9], we suggest to restrict our estimates of $B^{*}$ to the partitions given by thresholding the empirical correlation matrix $\widehat{C}:=(\widehat{C}_{ij})_{i,j\in[1:p]}$ where $\widehat{C}_{ij}:=s_{ij}/\sqrt{s_{ii}s_{jj}}$ , with $S=(s_{ij})_{(i,j)\in[1:p]^{2}}$ . If $\lambda\in[0,1]$ , let $B_{\lambda}$ be the finest partition of the thresholded empirical correlation matrix $\widehat{C}_{\lambda}:=(\widehat{C}_{ij}\mathds{1}_{|\widehat{C}_{i,j}|>\lambda})_{i,j\leq p}$ . In other words, $B_{\lambda}:=B(\widehat{C}_{\lambda})$ . For some value $\lambda\in[0,1]$ , $B_{\lambda}$ can be found by "Breath-First-Search" (BFS) [21]. Furthermore, we do not need to compute $B_{\lambda}$ for all $\lambda\in[0,1]$ and we suggest in the following three different choices of grids for $\lambda$ .

First, we suggest the grid $A_{\widehat{C}}:=\{|\widehat{C}_{ij}|\;|\;1\leq i<j\leq p\}$ and we define the estimator $\widehat{B}_{\widehat{C}}:=\underset{B_{\lambda}\;|\;\lambda\in A_{\widehat{C}}}{\operatorname{arg\,min}}\Psi(B)$ . This grid is the finest one because that gives all the partitions $\{B_{\lambda}|\;\lambda\in]0,1[\}$ . Almost surely, the coefficients $(\widehat{C}_{ij})_{i<j}$ are all different. Thus, when we increase the threshold to the next value of $A_{\widehat{C}}$ , we only remove two symmetric coefficients from the empirical correlation matrix.

Proposition 4.

The computational complexity of $\widehat{B}_{\widehat{C}}$ is $O(p^{4})$ .

Using the rate of convergence of the estimated covariances and by Condition 4, we then suggest the estimator $\widehat{B}_{\lambda}:=B_{n^{-1/3}}$ , the partition of the empirical correlation matrix thresholded by $n^{-1/3}$ . With this threshold, we can not find all the partitions given by thresholded correlation matrix, but we only have to threshold by only one value.

Proposition 5.

The complexity of $\widehat{B}_{\lambda}$ is $O(p^{2})$ .

One can see that reducing the grid of thresholds to one value reduces the complexity of the estimator of $B^{*}$ . Finally, we suggest a third grid, in the c case when the maximal size of the groups $m$ is known.

Let $A_{s}:=\{s/p,(s+1)/p,...,(p-1)/p,1\}$ , where $s$ is the smallest integer such that all the groups of $B_{s/p}$ have a cardinal smaller than $m$ . The deduced estimator is $\widehat{B}_{s}:=\underset{B_{\lambda}\;|\;\lambda\in A_{s}}{\operatorname{arg\,min}}\Psi(B)$ . So, this grid is the set $\{l/p\;|\;l\in[1:p]\}$ restricted to the thresholds that give fine enough partition (with groups of size smaller than $m$ ).

Proposition 6.

The complexity of $\widehat{B}_{s}$ is $O(p^{2})$ .

One can see that the complexity of this estimator is as small as the complexity of the previous estimator $\widehat{B}_{\lambda}$ . Furthermore, it ensures that the estimated blocks are not too large, which was not the case with the previous estimator. However, the computation of $\widehat{B}_{s}$ requires the knowledge of $m$ while the other estimators do not.

Now that we have defined new estimators of $B^{*}$ , we give their convergence in the following proposition.

Proposition 7.

Let $\widehat{B}$ be either $\widehat{B}_{tot},\;\widehat{B}_{\widehat{C}},\;\widehat{B}_{\lambda}$ or $\widehat{B}_{s}$ indifferently. Under Conditions 1 to 4 and for a fixed $\delta\in]1/2,1[$ , we have

[TABLE]

When Condition 4 is not satisfied, we do not study the convergence of the previous estimators. In this case, we suggest to estimate $B^{*}$ by $B_{n^{-\delta/2}}$ , which is the partition given by the empirical correlation matrix thresholded by $n^{-\delta/2}$ . The complexity of this estimator is $O(p^{2})$ , as for the previous estimator $\widehat{B}_{\lambda}=B_{n^{-1/3}}$ . We show the convergence of this estimator in Proposition 8.

Proposition 8.

Under Conditions 1, 2 and 3, if $\alpha_{1}<\delta/2$ and $\alpha_{2}>\delta/2$ ,

[TABLE]

As Condition 4 is not satisfied, the true partition $B^{*}$ is again not reached by this estimator. Nevertheless, we get stronger results for the practical estimator $B_{n^{-\delta/2}}$ than for the theoretical estimator $\widehat{B}_{tot}$ when Condition 4 is not verified. Indeed, the condition "to be larger or equal than" is stronger that "not to be smaller than".

2.2.3 Convergence of the estimator of the covariance matrix

We have seen in Propositions 7 and 8 how to estimate the decomposition $B^{*}$ by $\widehat{B}$ . Now to estimate the covariance matrix $\Sigma$ , it suffices to impose the block-diagonal decomposition $\widehat{B}$ to the empirical covariance matrix $S_{\widehat{B}}$ . We show in Proposition 9 that the resulting block-diagonal matrix estimator $S_{\widehat{B}}$ reaches the optimal rate of convergence under Conditions 1 to 4.

Proposition 9.

Let $\|.\|_{F}$ be the Frobenius norm defined by $\|\Gamma\|_{F}^{2}:=\sum_{i,j=1}^{p}\gamma_{ij}^{2}$ . Let $\widehat{B}$ be either $\widehat{B}_{tot},\;\widehat{B}_{\widehat{C}},\widehat{B}_{\lambda}$ or $\widehat{B}_{s}$ . Under Conditions 1 to 4 and for a fixed $\delta\in]1/2,1[$ , we have

[TABLE]

and

[TABLE]

Moreover, it is the best rate that we can have because

[TABLE]

Thus, we see that the quantity $\frac{1}{p}\|S_{\widehat{B}}-\Sigma\|_{F}^{2}$ decreases to 0 in probability with rate $1/n$ , which is the same rate as $S_{B^{*}}$ if we know the true decomposition $B^{*}$ . Thus, the lack of knowledge of $B^{*}$ does not deteriorate the convergence of our estimator.

Now that we gave the rate of convergence of our estimator $S_{\widehat{B}}$ , we compare it with that of the empirical estimator $S$ in the next proposition.

Proposition 10.

Under Conditions 1 and 2, the rate of the empirical covariance is

[TABLE]

and we have

[TABLE]

So, we know that $\frac{1}{p}\|S-\Sigma\|_{F}^{2}$ is lower-bounded in average and is bounded in probability. Thus, the rate of convergence of our suggested estimator $S_{\widehat{B}}$ is better than the empirical covariance matrix $S$ .

If Condition 4 does not hold, the rate of convergence is given in the following proposition.

Proposition 11.

Under Conditions 1, 2 and 3, for all $\delta\in]0,1[$ and for all $\varepsilon>0$ , we have

[TABLE]

We remark that, for $\delta$ close to $1$ , this rate of convergence almost reaches the optimal rate of $S_{B^{*}}$ , whereas the partition estimator $B_{n^{-\delta/2}}$ does not reach the true decomposition $B^{*}$ . That comes from the fact that the elements $\sigma_{ij}$ of $\Sigma$ such that the indices $(i,j)$ are not in the estimated partition $B_{n^{-\delta/2}}$ are small (with high probability). Hence, estimating these values by [math] does not increase so much the error $\frac{1}{p}\|S_{B_{n^{-\delta/2}}}-\Sigma\|_{F}^{2}$ .

Theoretical guaranties for a block-diagonal estimator of the covariance matrix are also provided in [9]. Their framework is more general, with a true covariance matrix which is not necessarily block-diagonal. They bound the average of the square Hellinger distance between the true normal density and the density with the block-diagonal estimated covariance matrix. However, when $p/n$ does not go to [math], their theoretical results becom uninformative. Indeed, they give an upper-bound which is larger than one, while the square Hellinger distance remains always smaller than $1$ .

2.2.4 Discussion about the assumptions

For the previous results, we needed to make four assumptions on $\Sigma$ (Conditions 1 to 4, given in Section 2.2.1).

Condition 1 provides a standard setting for high-dimensional problems, in particular for estimation of covariance matrices [24, 32]. Studying an higher dimensional setting where $p/n\longrightarrow+\infty$ would be interesting in future work.

Condition 2 is needed to bound the operator norm of $\Sigma$ and $\Sigma^{-1}$ and the eigenvalues of the empirical covariance matrix (with high probability). It also enables to bound the diagonal terms of $\Sigma$ , which allow to derive the rate of convergence of each component of the empirical covariance (using in particular Bernstein’s inequality, see the proofs for more details).

Condition 3 states that the blocks of the true decomposition have a maximal size. It implies that the number of non-zero terms of $\Sigma$ is $O(p)$ .

Condition 4 requires that a finer block decomposition $\Sigma_{B}$ is not too close to the true $\Sigma$ . This condition is needed to not confuse $B^{*}$ with a finer decomposition. However, Condition 4 seems to be less mild than the others. That is why we also focus on the case when Condition 4 is not satisfied.

Nevertheless, even Condition 4 is not so restrictive. Indeed, we suggest in Proposition 12 a reasonable example where $\Sigma$ is randomly generated and where a condition similar to Condition 4 holds.

Proposition 12.

Let $L\in\mathbb{N}$ and $\varepsilon>0$ . Assume that for all $p$ , $\Sigma$ is generated in the following way:

•

Let $B^{*}$ be a partition of $[1:p]$ such that all its elements have a cardinal between $10$ and $m\geq 10$ . Let $K$ be the number of groups (the cardinal of $B^{*}$ ). For all $k\in[1:K]$ , let $p_{k}$ be the cardinal of the " $k$ -th element" of $B^{*}$ .

•

For all $k\in[1:K]$ , let $(U_{i}^{(l)})_{i\in[1:p_{k}],\;l\in[1:L]}$ be i.i.d. with distribution $\mathcal{U}([-1,1])$ . Let $U\in\mathcal{M}_{L,p_{k}}(\mathbb{R})$ such that the coefficient $(l,i)$ is $U_{i}^{(l)}$ . Let $\Sigma_{B_{k}^{*}}=U^{T}U+\varepsilon I_{p_{k}}$ , where $\Sigma_{B_{k}^{*}}$ is the sub-matrix of $\Sigma$ indexed by the elements of $B_{k}^{*}$ .

•

Let $\sigma_{ij}=0$ for all $(i,j)\notin B^{*}$ .

Then, Conditions 2 and 3 are verified and the following slightly modified version of Condition 4 is satisfied for all $a>0$ :

[TABLE]

Thus, if $p/n\longrightarrow y\in]0,1[$ , the conclusions of Propositions 2, 7 and 9 remain true when the probabilities are defined with respect to $\Sigma$ and $X$ which distribution conditionally to $\Sigma$ is $\mathcal{N}(\mu,\Sigma)$ .

2.2.5 Numerical applications

We present here numerical applications of the previous results with simulated data. We generate a covariance matrix $\Sigma$ as in Proposition 12 with blocks of random size distributed uniformly on $[10:15]$ , with $L=5$ and $\varepsilon=0.2$ . We assume here that we know that the maximal size of the block is $m=15$ , so we can use the estimator $\widehat{B}=\widehat{B}_{s}$ given in Proposition 7 to reduce the complexity to $O(p^{2})$ and to prevent the blocks from being too large.

We plot in Figure 1 the Frobenius norm of the error of the empirical covariance matrix $S$ and the Frobenius norm of the error of the suggested estimator $S_{\widehat{B}}$ , with $n=N\,p$ for different values of $N$ . We can remark that the error of $S$ is in $\sqrt{K}$ (where $K$ is the number of groups) whereas the error of $S_{\widehat{B}}$ stays bounded as in Proposition 9. For $K=100$ , the Frobenius error of $S_{\widehat{B}}$ on Figure 1 is about 10 times smaller than the one of $S$ .

2.3 Convergence and efficiency in fixed dimension

In this section, $p$ and $\Sigma$ are fixed and $n$ goes to $+\infty$ . We choose a different penalisation $\kappa=\frac{1}{pn^{\delta}}$ with $\delta\in]0,1/2[$ (instead of $\delta\in]1/2,1[$ in the previous setting). This framework enables to study the efficiency of estimators of $\Sigma$ . Contrary to the high-dimensional setting of Section 2.2, we do not assume particular condition in addition to the ones given in Section 2.1.

We first give the convergence of $\widehat{B}_{tot}$ defined in Equation (1) in the next proposition.

Proposition 13.

We have

[TABLE]

Corollary 1.

Let $\widehat{B}_{\widehat{C}}:=\underset{B_{\lambda}\;|\;\lambda\in A_{\widehat{C}}}{\operatorname{arg\,min}}\Psi(B)$ , where $A_{\widehat{C}}:=\{|\widehat{C}_{ij}|\;|\;1\leq i<j\leq p\}$ as in Proposition 7. Then

[TABLE]

In the rest of Section 2.3, we write $\widehat{B}$ for $\widehat{B}_{tot}$ or $\widehat{B}_{\widehat{C}}$ . The aim of this framework is to show that the suggested estimator $S_{\widehat{B}}$ is asymptotically efficient as if the true decomposition $B^{*}$ were known.

As the parameter $\Sigma$ is in the set $S_{p}^{++}(\mathbb{R})$ or even $S_{p}^{++}(\mathbb{R},B^{*})$ , which are not open subsets of $\mathbb{R}^{p^{2}}$ , the classical Cramér-Rao bound is no longer a lower-bound for the estimation error. Furthermore, as $B^{*}$ is not known, the number of parameters of $S_{\widehat{B}}$ is not constant. That is why the classical Cramér-Rao bound is not relevant in our setting. We remark that applying this classical Cramér-Rao bound to a subset of the matrix estimator does not solve this problem.

A specific Cramér-Rao bound is suggested in [35] for parameters and estimators which satisfy continuously differentiable constraints. We shall consider linear constraints here. We let $\theta\in\mathbb{R}^{d}$ be the parameter, that is assumed to be restricted to a linear subspace $V$ of dimension $q$ in $\mathbb{R}^{d}$ . In this case, if $U\in\mathcal{M}_{d,q}(\mathbb{R})$ is a matrix whose columns are the elements of an orthonormal basis of $V$ and if $J$ is the Fisher Information Matrix (FIM) of $\theta$ in the non-constraint case, [35] states that for unbiased estimator $\widehat{\theta}\in V$ , we have

[TABLE]

where $\leq$ is the partial order on the symmetric positive semi-definite matrices.

In our setting, remark that $S_{p}^{++}(\mathbb{R})$ is an open subset of the linear subspace $S_{p}(\mathbb{R})$ of symmetric matrices and $S_{p}^{++}(\mathbb{R},B^{*})$ is an open subset of the linear subspace $\overline{S_{p}(\mathbb{R},B^{*})}:=\{\Gamma\in S_{p}(\mathbb{R}),\;\Gamma_{B^{*}}=\Gamma\}$ . We let $\operatorname{vec}(\Sigma)$ be the column vectorization of $\Sigma$ . Hence, the parameter is $\operatorname{vec}(\Sigma)$ and there are $p(p-1)/2$ linear constraints arising from the symmetry and $p(p-1)/2-\sum_{k=1}^{K}p_{k}(p_{k}-1)/2$ linear constraints arising from the block structure $B^{*}$ .

So, the Cramér-Rao bound of Equation (2) is adapted to our framework, by considering the parameter $\operatorname{vec}(\Sigma)\in\mathbb{R}^{p^{2}}$ , and we say that an estimator is efficient if it reaches the Cramér-Rao bound (2) (meaning that there is an equality in this equation), where the constraints (symmetry only or symmetry and block structure) will be stated explicitly.

Proposition 14 states that, in general, the empirical covariance matrix is efficient with this Cramér-Rao bound. This supports this choice of Cramér-Rao Bound, since in fixed dimension, one would expect that the empirical matrix is the most appropriate estimator.

If the empirical covariance matrix did not reach the Cramér-Rao Bound, we could not hope that $S_{\widehat{B}}$ would be efficient in the model where $B^{*}$ was known, and this Cramér-Rao bound would not be well tuned to our problem.

Proposition 14.

If $\mu$ is known, the empirical estimator $S$ is an efficient estimator of $\Sigma$ in the model $\{\mathcal{N}(\mu,\Sigma),\;\Sigma\in S_{p}^{++}(\mathbb{R})\}$ .

Remark 1.

In Proposition 14, we assume that $\mu$ is known to reach the Cramér-Rao bound for fixed $n$ (and not only asymptotically). This will be the same in Proposition 15.

Now, we deduce the efficiency of $S_{B^{*}}$ when $B^{*}$ is known.

Proposition 15.

If $\mu$ and $B^{*}$ are known, $S_{B^{*}}$ is an efficient estimator of $\Sigma$ in the model $\{\mathcal{N}(0,\Sigma),\;\Sigma\in S_{p}^{++}(\mathbb{R},B^{*})\}$ .

Finally, Proposition 16 states the asymptotic efficiency of our estimator $S_{\widehat{B}}$ (even for unknown $\mu$ )

Proposition 16.

[TABLE]

where $\mathrm{CR}(\Sigma,B^{*})$ is the Cramér-Rao bound of $\operatorname{vec}(\Sigma)$ in the model $\{\mathcal{N}(0,\Sigma),\;\Sigma\in S_{p}^{++}(\mathbb{R},B^{*})\}$ .

The explicit expression of the $p^{2}\times p^{2}$ matrix $\mathrm{CR}(\Sigma,B^{*})$ can be found in the appendix where Propositions 14, 15 and 16 are proved.

3 Application to the estimation of the Shapley effects

In this section, we apply the block-diagonal estimation of the covariance matrix $\Sigma$ to estimate the Shapley effects in high dimension and for Gaussian linear models. In Section 3.1, we recall the definition of the Shapley effects with their particular expression in the Gaussian linear framework with a block-diagonal covariance matrix. In Section 3.2, we address the problem of estimating the Shapley effects when the covariance matrix $\Sigma$ is estimated. We derive the convergence of the estimators of the Shapley effects from the results of Section 2.

3.1 The Shapley effects

Let $(X_{i})_{i\in[1:p]}$ be random inputs variables on $\mathbb{R}^{p}$ and let $Y=f(X)$ be the real random output variable in $L^{2}$ . We assume that $\operatorname{Var}(Y)\neq 0$ . Here, $f$ can be a numerical simulation model [31].

If $u\subset[1:p]$ and $x=(x_{i})_{i\in[1:p]}\in\mathbb{R}^{p}$ , we write $x_{u}:=(x_{i})_{i\in u}$ . We can define the Shapley effects as in [26] for the input variable $X_{i}$ as:

[TABLE]

where $-i$ is the set $[1:p]\setminus\{i\}$ . One can see in Equation (3) that adding a $X_{i}$ to $X_{u}$ changes the conditional expectation of $Y$ , and increases the variability of this conditional expectation. The Shapley effect $\eta_{i}$ is large when, on average, the variance of this conditional expectation increases significantly when $X_{i}$ is observed. Thus, a large Shapley effect $\eta_{i}$ corresponds to an important input variable $X_{i}$ .

The Shapley effects have interesting properties for global sensitivity analysis. Indeed, there is only one Shapley effect for each variable (contrary to the Sobol indices). Moreover, the sum of all the Shapley effects is equal to $1$ (see [26]) and all these values lie in $[0,1]$ even with dependent inputs. This is very convenient for the interpretation of these sensitivity indices.

Here, we assume that $X\sim\mathcal{N}(\mu,\Sigma)$ , that $\Sigma\in S_{p}^{++}(\mathbb{R})$ and that the model is linear, that is $f:x\longmapsto\beta_{0}+\beta^{T}x$ , for a fixed $\beta_{0}\in\mathbb{R}$ and a fixed vector $\beta$ . This framework is widely used to model physical phenomena (see for example [19, 13, 29]). Indeed, uncertainties are often modelled as Gaussian variables and an unknown function is commonly estimated by its linear approximation. Furthermore, the main focus on this paper is on the high-dimensional case, where $p$ is large. In high dimension, linear models are often considered, as more complex models are not necessarily more relevant. In this framework, the sensitivity indices can be calculated explicitly [27]:

[TABLE]

with

[TABLE]

where $\beta_{u}:=(\beta_{i})_{i\in u}$ and $\Gamma_{u,v}:=(\Gamma_{i,j})_{i\in u,j\in v}$ . Thus, in the Gaussian linear framework, the Shapley effects are functions of the parameters $\beta$ and $\Sigma$ .

Despite the analytical formula (5), even in the case where $\Sigma$ and $\beta$ are known, the computational cost of the Shapley effects remains an issue when the number of input variables $p$ is too large ( $p\geq 30$ ), as it is highlighted in [4]. Indeed, the Shapley effects depend on $2^{p}$ values, namely the $(\operatorname{Var}(Y|X_{u}))_{u\subset[1:p]}$ . However, when the covariance matrix is block-diagonal, [4] showed that this high-dimensional computational problem boils down to a collection of lower dimensional problems.

Indeed, assume that $\Sigma\in S_{p}^{++}(\mathbb{R},B^{*})$ with $B^{*}=\{B_{1}^{*},B_{2}^{*},...,B_{K}^{*}\}$ . If $i\in[1:p]$ , let $[i]$ denotes the group of $i$ , that is $i\in B_{[i]}^{*}$ . Using Corollary 2 of [4], we have for all $i\in[1:p]$ ,

[TABLE]

where for all $v\subset B_{[i]}^{*}$ ,

[TABLE]

and where $w-v=w\subset v$ . Thus, when $\Sigma$ and $\beta$ are known, to compute all the Shapley effects $(\eta_{i})_{i\in[1:p]}$ , we only have to compute the $\sum_{k=1}^{K}2^{|B_{k}^{*}|}$ values $\{\operatorname{Var}(Y|X_{u}),\;u\subset B_{k}^{*},\;k\in[1:K]\}$ instead of all the $2^{p}$ values $\{\operatorname{Var}(Y|X_{u}),\;u\subset[1:p]\}$ . Some numerical experiments highlighting this gain are given in [4]. The complexity of the computation of the Shapley effects is $O(K2^{m})$ , where $m$ denotes the size of the maximal group in $B^{*}$ .

If $\Sigma$ is known, but the decomposition $B^{*}$ is unknown, we can compute $B^{*}$ from $\Sigma$ . We can for example use "Breath-First-Search" (BFS). The complexity of this algorithm is in $O(pm^{2})$ .

To conclude, when the parameters $\beta$ and $\Sigma$ are known with $\Sigma\in S_{p}^{++}(\mathbb{R},B^{*})$ , the computation of all the Shapley effects has a complexity $O(K2^{m})$ .

3.2 Estimation of the Shapley effects in high dimension

We now address the problem when the parameters $\mu$ , $\Sigma$ and thus $B^{*}$ are unknown.

We assume that we just observe a sample $(X^{(l)},\tilde{Y}^{(l)})_{l\in[1:n]}$ where $\tilde{Y}=(\tilde{Y}^{(l)})_{l\in[1:n]}$ are noisy observations:

[TABLE]

for $l\in[1:n]$ where $(\varepsilon^{(l)})_{l\in[1:n]}$ are i.i.d. with distribution $\mathcal{N}(0,\sigma^{2}_{n})$ and where $\sigma_{n}\leq C_{\sup}$ is unknown, where $C_{\sup}$ is a fixed finite constant.

Remark that the computation of the Shapley effects requires the parameters $\beta$ and $\Sigma$ (see Equations (4) and (5) or (6) and (7)). Here, as we do not know the parameters $\beta$ and $\Sigma$ , we will estimate them and replace the true parameters by their estimation in Equations (4) and (5) or (6) and (7).

First, we estimate $(\beta_{0}\;\beta^{T})^{T}$ as usual by

[TABLE]

where $A\in\mathcal{M}_{n,p+1}(\mathbb{R})$ is defined by $A_{l,i+1}:=X_{i}^{(l)}$ and $A_{l,1}=1$ , and where $n>p$ .

At first glance, we could estimate $\Sigma$ by the empirical covariance matrix $S$ and replace it in the computation of the Shapley effects given by Equations (4) and (5) or (6) and (7). However, $B^{*}$ is not known and we can not find it using BFS with the empirical covariance matrix $S$ (which usually has the simple structure $\{[1:p]\}$ with probability one). Thus, we can not use the formula (6) of the Shapley effects with independent groups. So, the only way to estimate the Shapley effects is using Equations (4) and (5), replacing $\Sigma$ by the empirical covariance matrix $S$ . However, as we have seen, the complexity of this computation would be exponential in $p$ and it would be no longer tractable for $p\geq 30$ . Furthermore, in high dimension, the Frobenius error between $S$ and $\Sigma$ does not go to [math] (see Proposition 9). Thus, using the empirical covariance matrix could yield estimators of the Shapley effects that do not converge.

For that reason, to estimate $\eta=(\eta_{i})_{i\in[1:p]}$ , we suggest to estimate $B^{*}$ by $\widehat{B}$ (defined in Section 2.2.2) and $\Sigma$ by $S_{\widehat{B}}$ and to replace them in the analytical formula (6). We write $\widehat{\eta}=(\widehat{\eta}_{i})_{i\in[1:p]}$ the estimator of the Shapley effects obtained replacing $B^{*}$ by $\widehat{B}$ , $\Sigma$ by $S_{\widehat{B}}$ and $\beta$ by $\widehat{\beta}$ in Equations (6) and (7). We use our previous results on the estimation of the covariance matrix to obtain the convergence rate of $\widehat{\eta}$ .

We focus on the high-dimensional case, when $p$ and $n$ go to $+\infty$ . In this case, $\beta$ and $\Sigma$ are not fixed but depend on $n$ (or $p$ ). As in Section 2.2, we choose $\kappa=\frac{1}{pn^{\delta}}$ with $\delta\in]1/2,1[$ to compute $\widehat{B}$ . To prevent problematic cases, we also add an assumption on the vector $\beta$ .

Condition 5.

There exist $\beta_{\inf}>0$ and $\beta_{\sup}<+\infty$ such that for all $n$ and for all $j\leq p$ , we have $\beta_{\inf}\leq|\beta_{j}|\leq\beta_{\sup}$ .

Proposition 17.

Under Conditions 1 to 5 and if $\delta\in]1/2,1[$ , then for all $\gamma>1/2$ , we have

[TABLE]

Recall that $\sum_{i=1}^{p}\eta_{i}=1$ . Thus, to quantify the error estimation, the value of $\sum_{i=1}^{p}\left|\widehat{\eta}_{i}-\eta_{i}\right|$ is a relative error. Proposition 17 states that this relative error goes to zero at the parametric rate $1/n^{1/2}$ , up to a logarithm factor.

We have seen in Section 3.1 that, once we have the block-diagonal covariance matrix, the computation of the Shapley effects has the complexity $O(K2^{m})$ which is equal to $O(n)$ under Condition 3. In Section 2.2, we gave four different choices of $\widehat{B}$ , with four different complexities, all larger than $O(n)$ . Thus, the complexity of the whole estimation of the Shapley effects (including the estimation of $\Sigma$ ) is the same as the complexity of $\widehat{B}$ (see Section 2.2.2).

When Condition 4 is not satisfied, we still have the convergence of the relative error, with almost the same rate.

Proposition 18.

Under Conditions 1, 2, 3 and 5, for all $\delta\in]0,1[$ , choosing the partition $B_{n^{-\delta/2}}$ and for all $\varepsilon>0$ , we have

[TABLE]

Remark 2.

When the dimension $p$ is fixed, the rate of convergence is $O_{p}(1/\sqrt{n})$ , as if we estimated $\Sigma$ by the empirical covariance matrix. Moreover, we have seen in Proposition 16 that the computation of $S_{\widehat{B}}$ enables to reach asymptotically the Cramér-Rao bound of [35] as if $B^{*}$ were known. We then deduce the asymptotic efficiency of $\widehat{\eta}$ . If we define $g:\Sigma\mapsto\eta$ , let $\mathrm{CR}(\eta,B^{*}):=Dg(\Sigma)\mathrm{CR}(\Sigma,B^{*})Dg(\Sigma)$ be the Cramér-Rao bound of $\eta$ in the model $\{\mathcal{N}(\mu,\Sigma),\;\Sigma\in S_{p}^{++}(\mathbb{R},B^{*})\}$ . Thus,

[TABLE]

3.3 Numerical application

We have seen in Proposition 12 a way to generate $\Sigma$ which verifies Conditions 1 to 3 and some slightly modified version of Condition 4. So, with this choice of $\Sigma$ , we derive in Proposition 19 the convergence of the Shapley effects estimation.

Proposition 19.

Under Condition 5, if $\Sigma$ is generated as in Proposition 12, then, for all $\gamma>1/2$ ,

[TABLE]

where the probabilities are defined with respect to $\Sigma$ and $X$ , which distribution conditionally to $\Sigma$ is $\mathcal{N}(\mu,\Sigma)$ .

We now present a numerical application of Proposition 19. The matrix $\Sigma$ is generated by Proposition 12 as in Section 2.2.5, with blocks of random size distributed uniformly on $[10,15]$ , $L=5$ and $\varepsilon=0.2$ . For all $p$ , the vector $\beta$ is generated with distribution $\mathcal{U}([1,2]^{p})$ , so that Condition 5 is satisfied. As in Section 2.2.5, we assume that we know that the maximal size of the block is $m=15$ , so we can use the estimator $\widehat{B}=\widehat{B}_{s}$ given in Proposition 7. As the computation of the Shapley effects is exponential in the maximal block size, the estimator $\widehat{B}_{s}$ is preferred. The complexity of the estimation of the Shapley effects is then in $O(p^{2})$ .

We plot in Figure 2 the sum of the Shapley effects estimation error $\sum_{i=1}^{p}\left|\widehat{\eta}_{i}-\eta_{i}\right|$ , with $n=N\,p$ for different values of $N$ . We can remark that the sum of the errors seems to be or order $1/\sqrt{K}$ , which is confirmed by Proposition 19.

4 Application on real data

In this section, we consider a real application of our suggested estimators of block-diagonal covariance matrices and of the Shapley effects, to nuclear data.

4.1 The Shapley effects with nuclear data

Uncertainty propagation methods are increasingly being used in nuclear calculation (neutron dosimetry, reactor design, criticality assessments, etc.) to deduce the accuracy of safety parameters (fast fluence, reactivity coefficients, criticality, etc.) and to establish safety margins. In fact, the resulting output of a nuclear computer model can be considered with a random portion as far as the inputs are uncertain. In this context, sensitivity analysis evaluates the impact of input uncertainties in terms of their relative contributions to the uncertainty in the output. Therefore, it helps to prioritize efforts for uncertainty reduction, improving the quality of the data.

Of particular interest for us, the uncertain inputs, in nuclear applications, tend to be correlated because of the measurement processes and the different calculations made to obtain the variables of interest from the observable quantities. This is why the Shapley effects are particularly convenient as sensitivity indices in nuclear uncertainty quantification. Moreover, these uncertain inputs are easily modeled as a Gaussian vector and the output is often modeled as a linear function of the inputs [19]. Thus, the Shapley effects can be computed or estimated, as it is described in Section 3.

4.2 Details of the nuclear data

In this application, the output is the neutron flux $Y$ which is a quantity of interest in nuclear studies. For example, it can be calculated to evaluate the vessel neutron irradiation which is in fact one of the limiting factors for pressurized water reactor (PWR) lifetime. The quality of radiation damage prediction depends in part on the calculation of the fast neutron flux for energy larger than $1MeV=10^{6}eV$ ( $eV$ means electron-volt). In that sense, a lack of knowledge on the fast neutron flux will require larger safety margins on the plant lifetime affecting operating conditions and the cost of nuclear installations. To make correct decisions when designing the plant lifetime and on safety margins for PWRs, it is therefore essential to assess the uncertainty in vessel flux calculations.

One of the major sources of uncertainties in fast flux calculations are the cross sections which are used to characterise the probability that a neutron will interact with a nucleus and are the inputs $X$ of our model. They are expressed in barn, where 1 $barn=10^{-4}cm^{2}$ . The values of the cross sections and their uncertainties are provided by international libraries as the American Library ENDF/B-VII [25], the European library JEFF-3 [17], and the Japan Library JENDL-4 [18]. Using the standardized format, each cross section is defined for an isotope $iso$ of the target nuclei, an energy level $E$ of the target nuclei and a reaction number $mt$ (see [25] for more informations on $mt$ numbers).

We assume that if $(iso,mt)\neq(iso^{\prime},mt^{\prime})$ , then, $X_{(iso,mt,E)}\operatorname{\perp\!\!\!\perp}X_{(iso^{\prime},mt^{\prime},E^{\prime})}$ for any $E,E^{\prime}$ . Thus, the covariance $\Sigma$ is block-diagonal, where each block corresponds to a value of $(iso,mt)$ . Here, we have 292 input variables divided in 50 groups of size between 2 and 18. Using reference data, [8] has shown that the perturbation of the cross sections of the 56Fe, 1H, 16O isotopes are linearly related to the perturbation of the flux:

[TABLE]

Thus, if $\Sigma$ and $\beta$ are given, the Shapley effects are easily computable by Equations (6) and (7). We show the values of the Shapley effects in Figure 3.

We can remark that almost all the Shapley effects are close to 0. Now, we plot all the Shapley effects that are larger than $1\%$ on Figure 4 with the names of the corresponding cross sections. For example, "Fe56 $\_$ S4 $\_$ 950050" means the cross section for the isotope ${}^{56}Fe$ , the reaction scattering 4 and a level of energy larger than $950050eV$ (and smaller than 1353400 $eV$ ).

We remark than only 23 cross sections have a Shapley effect larger than $1\%$ . The latter are associated with the lower energies (around 1 to 6 $MeV$ ). Moreover, they all come from three different groups of $(iso,mt)$ : (56Fe, scattering 4), (56Fe, scattering 2) and (1H, scattering 2).

In fact, in PWR reactors, fission mostly results from the absorption of slow neutrons by nuclei of high atomic number as the uranium 235U and the plutonium 239Pu which are the main fissile isotopes. The nucleus splits into two lighter nuclei, called fission fragments and often produces an average of 2.5 neutrons with an energy of about 1 $MeV$ or more. Figure 5 illustrates the neutron fission spectrum which defines the probability for a neutron to be emitted in the energy group $g$ by the isotope $i$ . One can note that there are more neutrons produced in the first energy groups between 1 to 6 $MeV$ . In that sense, those neutrons have a larger potential to interact with matter.

Moreover, most of the fast neutrons which escape from the core are scattered back by the reflector (essentially comprised of 56Fe) and slowed down by water (hydrogen and oxygen), which acts as a moderator, until they reach thermal equilibrium. The accuracy of the neutron flux received by the reflector is closely linked to the precision of the scattering cross sections.

On the other hand, we can notice that all the Shapley effects of the cross sections from $(^{1}$ H, scattering 2) are close, and that comes from the fact that the correlations between these different levels of energy are close to $1$ in this group.

Thus, when the true parameters $\Sigma$ and $\beta$ are known, we can compute the corresponding Shapley effects of the uncertainties of the cross sections on the uncertainty of the neutron flux $Y$ . Furthermore, the interpretation of these Shapley effects is insightful and consistent with the available expert knowledge.

4.3 Estimation of the Shapley effects

In order to assess the efficiency of our suggested estimation procedures of the Shapley effects, we now assume that the true covariance matrix $\Sigma$ is unknown and that we observe an i.i.d. sample $(X^{(l)})_{l\in[1:n]}$ with distribution $\mathcal{N}(\mu,\Sigma)$ (with $\mu$ unknown). We assume that the maximal group size is known to be smaller or equal to $20$ and that the vector $\beta$ is known. Then, we estimate the block-diagonal structure by the block-diagonal structure $\widehat{B}$ that maximizes the penalized likelihood $\Phi$ among all the block-diagonal structures obtained by thresholding the empirical correlation matrix from its largest value to the smallest value such that the maximal size of the blocks is smaller or equal to $20$ . Thus, our estimator $\widehat{B}$ is a mix of the estimators $\widehat{B}_{\widehat{C}}$ and $\widehat{B}_{s}$ detailed in Section 2.2.2.

We plot the Frobenius error of the estimated covariance matrix and the sum of the absolute values of the errors of the estimated Shapley effects for different values of $y=p/n$ in Figure 6, where $p=292$ .

We can remark that the errors decrease globally when the value of $\frac{n}{p}$ increases. The larger value of the sum of the errors of the estimated Shapley effects for $n/p=50$ is due to the randomness of the estimated Shapley effects. Note that, even when $n=2p$ , the sum of the errors of the Shapley effects is less than $0.05$ (recall that, in comparison, the sum of the Shapley effects is $1$ ). We plot in Figure 7 the estimated Shapley effects that are larger than $1\%$ with $n/p=2$ . Remark that these estimated values are similar to the true ones displayed in Figure 4 and the physical interpretation is the same.

In conclusion, we implemented an estimator of the block-diagonal covariance matrix originating from nuclear data when we only observe an i.i.d. sample of the inputs. Then, the derived estimated Shapley effects are shown to be very close to the true Shapley effects, that quantify the impact of the uncertainties of cross sections on the uncertainty on the neutron flux. When the sample size $n$ is equal to $2p$ , the physical conclusions are the same as when the true covariance matrix is known.

5 Conclusion

In this work, we suggest an estimator of a block-diagonal covariance matrix for Gaussian data. We prove that in high dimension, this estimator converges to the same block-diagonal structure with complexity in $O(p^{2})$ . For fixed dimension, we also prove the asymptotic efficiency of this estimator, that performs asymptotically as well as as if the true block-diagonal structure were known. Then, we deduce convergent estimators of the Shapley effects in high dimension for Gaussian linear models. These estimators are still available for thousands input variables, as long as the maximal block is not too large. Moreover, we prove the convergence of the Shapley effects estimators when the observations of the output are noisy and so the parameter $\beta$ is estimated. Finally, we applied these estimator on real nuclear data.

In future works, it would be interesting to treat the higher dimension setting when $p/n$ goes to $+\infty$ .

Acknowledgements

We acknowledge the financial support of the Cross-Disciplinary Program on Numerical Simulation of CEA, the French Alternative Energies and Atomic Energy Commission. We would like to thank BPI France for co-financing this work, as part of the PIA (Programme d’Investissements d’Avenir) - Grand Défi du Numérique 2, supporting the PROBANT project. We thank Vincent Prost for his help.

Appendix

Notation

We will write $C_{\sup}$ for a generic non-negative finite constant (depending only on $\lambda_{\inf}$ , $\lambda_{\sup}$ and $m$ in Conditions 2 and 3). The actual value of $C_{sup}$ is of no interest and can change in the same sequence of equations. Similarly, we will write $C_{\inf}$ for a generic strictly positive constant.

If $B,B^{\prime}\in\mathcal{P}_{p}$ , and $(i,j)\in[1:p]^{2}$ , we will wite $(i,j)\in B\setminus B^{\prime}$ if $(i,j)\in B$ and $(i,j)\notin B^{\prime}$ , that is, if $i$ and $j$ are in the same group with the partition $B$ and are in different groups with the partition $B^{\prime}$ .

If $B,B^{\prime}\in\mathcal{P}_{p}$ , we define $B\cap B^{\prime}$ as the maximal partition $B^{\prime\prime}$ such that $B^{\prime\prime}\leq B$ and $B^{\prime\prime}\leq B^{\prime}$ .

If $\Gamma\in\mathcal{M}_{p}(\mathbb{R})$ (the set of the matrices of dimension $p\times p$ ), and if $u,v\subset[1:p]$ , we define $\Gamma_{u,v}:=(\Gamma_{i,j})_{i\in u,j\in v}$ and $\Gamma_{u}:=\Gamma_{u,u}$ .

Recall that $\operatorname{vec}:\mathcal{M}_{p}(\mathbb{R})\rightarrow\mathbb{R}^{p^{2}}$ is defined by $(\operatorname{vec}(M))_{p(j-1)+i}:=M_{i,j}$ .

If $\Gamma\in S_{p}(\mathbb{R})$ (the set of the symmetric positive definite matrices) and $i\in[1:p]$ , let $\phi_{i}(M)$ be the $i$ -th largest eigenvalue of $M$ . We also write $\lambda_{\max}(M)$ (resp. $\lambda_{\min}(M)$ ) for the largest (resp. smallest) eigenvalue of $M$ .

We define $\widehat{\Sigma}:=\frac{1}{n-1}\sum_{l=1}^{l}(X^{(l)}-\overline{X})(X^{(l)}-\overline{X})^{T}=\frac{n}{n-1}S$ , the unbiased empirical estimator of $\Sigma$ . Let $(\widehat{\sigma}_{ij})_{i,j\leq p}$ be the coefficients of $\widehat{\Sigma}$ and $(s_{ij})_{i,j\leq p}$ be the coefficients of $S$ .

Recall that when Condition 4 does not hold, we need to define $B({\alpha})$ as the partition given by thresholding $\Sigma$ by $n^{-\alpha}$ . We also define $K(\alpha):=|B(\alpha)|$ and write $B(\alpha)=\{B_{1}(\alpha),B_{2}(\alpha),...B_{K(\alpha)}(\alpha)\}$ .

Proof of Proposition 1

Proof.

Let us write

[TABLE]

which is the closure of $S_{p}^{++}(\mathbb{R},B)$ in $S_{p}^{++}(\mathbb{R})$ .

First, let us show that, for all $B$ , $S_{B}$ is the minimum of $\Gamma\mapsto l_{\Gamma}$ on $\overline{S_{p}^{++}(\mathbb{R},B)}$ . If $\Gamma_{B}\in\overline{S_{p}^{++}(\mathbb{R},B)}$ , we have

[TABLE]

The function $f:\mathbb{R}_{+}^{*}\rightarrow\mathbb{R}$ defined by $f(t):=-\log(t)+t-1$ has an unique minimum at $1$ . Thus, the function $g:S_{p}^{++}\rightarrow\mathbb{R}$ defined by $g(M):=\sum_{i=1}^{p}-\log\left(\phi_{i}\left[M\right]\right)+\phi_{i}\left(M\right)-1$ has an unique minimum at $I_{p}$ . Thus $\Gamma_{B}\in\overline{S_{p}^{++}(\mathbb{R},B)}\mapsto l_{\Gamma_{B}}-l_{S_{B}}$ has an unique minimum at $\Gamma_{B}=S_{B}$ .

Now, the penalisation term is constant on each $S_{p}^{++}(\mathbb{R},B)$ . Thus $\Phi$ has a global minimum (not necessary unique) at $S_{B}$ , for some $B\in\mathcal{P}_{p}$ . ∎

**Notation for Section 2.2

**Here and in all the proofs of Section 2.2, we assume Conditions 1 to 3 of Section 2.2.1.

In the following, we introduce some notation.

We know that

[TABLE]

where $\mathcal{W}(n-1,\Sigma)$ is the Wishart distribution with parameter $n-1$ and $\Sigma$ [12]. Thus, if we write $(M^{(k)})_{k}$ i.i.d. with distribution $\mathcal{N}(0,\Sigma)$ , we have

[TABLE]

Lemma 1.

For all $C_{\inf}>0$ ,

[TABLE]

and

[TABLE]

Moreover, that holds also for $\widehat{\Sigma}$ instead of $S$ .

Proof.

Let $(A^{(k)})_{k}$ i.i.d. with distribution $\mathcal{N}(0,I_{p})$ . Using the result in [32] which states that

[TABLE]

we have,

[TABLE]

and

[TABLE]

Thus,

[TABLE]

The proof is the same for $\lambda_{\min}$ . ∎

We also verify the assumptions of Bernstein’s inequality (see for example Theorem 2.8.1 in [36]). For all $i,j,k$ , let

[TABLE]

The random-variables $(Z_{ij}^{(k)})_{k}$ are independent, mean zero, sub-exponential and we have $\|Z_{ik}^{(k)}\|_{\psi_{1}}\leq\|M_{i}\|_{\psi_{2}}\|M_{j}\|_{\psi_{2}}\leq C_{\sup}\sqrt{\sigma_{ii}\sigma_{jj}}\leq C_{\sup}$ , where $\|.\|_{\psi_{1}}$ is the sub-exponential norm (for example, see Definition 2.7.5 in [36]). So, we can use Bernstein’s inequality with $(Z_{ij}^{(k)})_{k}$ : there exists $C_{\inf}$ such that, for all $\varepsilon>0$ and $n\in\mathbb{N}$ ,

[TABLE]

**Proof of Proposition 2

**

In this proof, we assume that Conditions 1 to 4 are satisfied. We first show several Lemmas.

Lemma 2.

For all symmetric positive definite $\Gamma$ and for all $B\in\mathcal{P}_{p}$ , if we write $\Delta=\Gamma-\Gamma_{B}$ , we have:

•

$v\mapsto\lambda_{\min}(\Gamma_{B}+v\Delta)$ * decreases and so $\min_{v\in[0,1]}\lambda_{\min}(\Gamma_{B}+v\Delta)=\lambda_{\min}(\Gamma)$ .*

•

$v\mapsto\lambda_{\max}(\Gamma_{B}+v\Delta)$ * increases and so $\max_{v\in[0,1]}\lambda_{\max}(\Gamma_{B}+v\Delta)=\lambda_{\max}(\Gamma)$ .*

Proof.

Let us show that $v\mapsto\lambda_{\max}(\Gamma_{B}+v\Delta)$ increases (the proof if the same for $\lambda_{\min}$ ).

For all $v\in[0,1]$ , let $\Gamma_{v}=\Gamma_{B}+v\Delta$ , $\lambda_{v}=\lambda_{\max}(\Gamma_{v})$ and $e_{v}$ an unit eigenvector of $\Gamma_{v}$ associated to $\lambda_{v}$ . Let $v,v^{\prime}\in[0,1],\;v<v^{\prime}$ . Thus

[TABLE]

If we show that $e_{v}^{T}\Delta e_{v}\geq 0$ , we proved that $v\mapsto\lambda_{v}$ increases. First, assume that $v=0$ . If we write $B_{k}$ the group of the largest eigenvalue of $\Gamma_{B}$ , then $(e_{0})_{i}$ is equal to zero for all $i\notin B_{k}$ , so $(e_{0}^{T}\Delta)_{j}$ is equal to zero for all $j\in B_{k}$ , and so $e_{0}^{T}\Delta e_{0}$ is equal to zero.

Assume now that $v>0$ and let us show that $e_{v}^{T}\Delta e_{v}\geq 0$ by contradiction. Assume that $e_{v}^{T}\Delta e_{v}<0$ . Then

[TABLE]

Furthermore, we have seen that $e_{0}^{T}\Delta e_{0}=0$ . Thus, we have

[TABLE]

that is in contradiction with $e_{v}\in\operatorname{arg\,max}_{u,\;\|u\|=1}u^{T}(\Gamma_{B}+v\Delta)u$ . ∎

In the following, let $\Delta_{B,B^{\prime}}:=S_{B}-S_{B^{\prime}}$ for all $B,B^{\prime}\in\mathcal{P}_{p}$ .

Lemma 3.

For all $B\in\mathcal{P}_{p}$ , we have

[TABLE]

Moreover, for all $B<B^{*}$ , we have

[TABLE]

Proof.

First, we prove Equation (10). Doing the Taylor expansion of $t\mapsto\log\circ\det\left(S_{B\cap B^{*}}+t\Delta_{B,B\cap B^{*}}\right)$ and using the integral form of the remainder (as Equation (9) of [30] or in [20]), we have

[TABLE]

where $\otimes$ is the Kronecker product. The trace is equal to zero. Now,

[TABLE]

using Lemma 2 for the two last steps.

Now, we prove Equation (11) similarly. We have, using Lemma 2,

[TABLE]

∎

Lemma 4.

[TABLE]

Proof.

Using Lemma 3, we have

[TABLE]

We show that the two terms go to [math]. The first term goes to [math] with Lemma 1. For the second term, we have

[TABLE]

using Bernstein’s inequality, where $Z_{ij}^{(k)}$ is defined in Equation (9). That concludes the proof. ∎

Lemma 5.

[TABLE]

Proof.

Using Lemma 3, we have

[TABLE]

The first term goes to 0 with Lemma 1. The second term is

[TABLE]

Now, for all $k\in[1;K]$ and for all $\emptyset\varsubsetneq B_{1}\varsubsetneq B_{k}^{*}$ , let $(i^{*},j^{*})\in\operatorname{arg\,max}_{i\in B_{1},j\in B_{k}^{*}\setminus B_{1}}|\sigma_{ij}|$ (with an implicit dependence on $k$ and $B_{1}$ ). Remark that

[TABLE]

Thus,

[TABLE]

Now, by Condition 4, we know that $\min_{\begin{subarray}{c}k\in[1:K],\\ \emptyset\varsubsetneq B_{1}\varsubsetneq B_{k}^{*}\end{subarray}}|\sigma_{i^{*}j^{*}}|\geq an^{-1/4}$ , so, for $n$ large enough,

[TABLE]

Thus, by Bernstein’s inequality, for $n$ large enough,

[TABLE]

∎

Now, we can prove Proposition 2.

Proof.

We have

[TABLE]

The two first terms go to 0 tanks to Lemmas 4 and 5. For the last term, we have

[TABLE]

These two last terms go to 0 thanks to Lemmas 4 and 5.

∎

Proofs of Proposition 3

In this proof, we assume that Conditions 1 to 3 hold.

Lemma 6.

For all $B\in\mathcal{P}_{p}$ , we have

[TABLE]

Moreover, for all $B<B(\alpha_{2})$ , we have

[TABLE]

Proof.

Same proof as Lemma 3 replacing $B^{*}$ by $B(\alpha_{2})$ . ∎

Lemma 7.

If $\alpha_{2}>\delta/2$ , then,

[TABLE]

Proof.

Following the proof of Lemma 4 (and using Lemma 6), it is enough to prove that the following term goes to 0:

[TABLE]

using again Bernstein’s inequality. That concludes the proof. ∎

Lemma 8.

If $\alpha_{1}<\delta/2$ , then,

[TABLE]

Proof.

Following the proof of Lemma 5, it suffices to prove that

[TABLE]

Now, by definition of $B(\alpha_{1})$ , we know that $\min_{\begin{subarray}{c}k\in[1:K(\alpha_{1})],\\ \emptyset\varsubsetneq B_{1}\varsubsetneq B_{k}(\alpha_{1})\end{subarray}}|\sigma_{i^{*}j^{*}}|\geq n^{-\alpha_{1}}$ , so, for $n$ large enough,

[TABLE]

Thus, by Bernstein’s inequality, for $n$ large enough,

[TABLE]

∎

We can now prove Proposition 3.

Proof.

We have

[TABLE]

First,

[TABLE]

from Lemma 8. Secondly,

[TABLE]

from Lemma 7. Finally,

[TABLE]

from Lemmas 7 and 8. ∎

Proofs of Propositions 4, 5 and 6

Proof.

In the three cases, the computation of $\widehat{B}$ requires carrying out the BFS algorithm for $B_{\lambda}$ and the computation of a determinant for $\Psi(B_{\lambda})$ . Recall that if $G=(V,E)$ is a graph (where $V$ is the set of vertices and $E$ the set of edges), the complexity of the BFS algorithm is $O(|V|+|E|)$ . Recall that, if $M$ is a squared matrix of size $p$ , the complexity of $\det(M)$ is $O(p^{3})$ using the LU decomposition.

Now, we compute the complexity of the three estimators $\widehat{B}_{\widehat{C}}$ , $\widehat{B}_{A}$ and $\widehat{B}_{s}$ .

•

For all $\lambda\in A_{\widehat{C}}$ , the complexity of $B_{\lambda}$ is $O(p^{2})$ , and the cardinal of $A_{\widehat{C}}$ is $O(p^{2})$ . Thus, the complexity of the computation of $\{B_{\lambda}\;|\;\lambda\in A_{\widehat{C}}\}$ is $O(p^{4})$ .

Now, for all $\lambda\in A_{\widehat{C}}$ , the complexity of $\Psi(B_{\lambda})$ is $O(p^{3})$ and the cardinal of $\{B_{\lambda}\;|\;\lambda\in A_{\widehat{C}}\}$ is $O(p)$ (because the function $\lambda\mapsto B_{\lambda}$ decreases). Thus, the complexity of the evaluations $\{\Psi(B),\;B\in\{B_{\lambda}\;|\;\lambda\in A_{\widehat{C}}\}\}$ is $O(p^{4})$

So the complexity of $\widehat{B}_{\widehat{C}}$ is $O(p^{4})$ .

•

For the threshold $n^{-1/3}$ , the complexity of $B_{n^{-1/3}}$ is $O(p^{2})$ .

So the complexity of $\widehat{B}_{\lambda}$ is $O(p^{2})$ .

•

One can divide the computation of $\widehat{B}_{s}$ into two steps.

For the first step, as we do not know the value of $s$ , we have to compute $B_{l/p}$ from $l=p$ to $l=s-1$ , verifying each time if the maximal size of group is smaller than $m$ or not. First, for each value of $l$ from $p$ decreasing to $s$ , the complexity of the BFS algorithm to $B_{l/p}$ is $O(p\times m^{2})=O(p)$ , thus, the complexity of all these partitions if $O(p^{2})$ . Then, for $l=s-1$ , the complexity of $B_{(s-1)/p}$ is $O(p^{2})$ . So, the complexity of this first step is $O(p^{2})$ .

In the second step, we have to evaluate $\Psi(B_{l/p})$ for all $l\in[s:p]$ . The complexity of each evaluation is $O(pm^{3})=O(p)$ , and the the number of evaluations is $O(p)$ . Thus, the complexity of this second step is $O(p^{2})$ .

∎

**Proof of Proposition 7

**

To prove the convergence of $\widehat{B}$ in the three cases, we need the three following Lemmas.

Lemma 9.

For all sequence $(\lambda_{n})_{n}$ such that for all $n$ , $\lambda_{n}\in[n^{-1/3},an^{-1/4}/3\lambda_{\sup}(1+\sqrt{y})^{2}]$ (we assume that $n$ is large enough and that subset is not empty), we have

[TABLE]

Proof.

Step 1: $B_{\lambda}\leq B^{*}$ with probability which goes to 1.

[TABLE]

using Lemma 1 and Bernstein’s inequality.

Step 2: $B_{\lambda}\geq B^{*}$ with probability which goes to 1.

For all $k\in[1:K]$ , and all $\emptyset\varsubsetneq B_{1}\varsubsetneq B_{k}^{*}$ , let $B_{2}:=B_{k}^{*}\setminus B_{1}$ and $(i^{*},j^{*}):=\operatorname{arg\,max}_{(i,j)\in B_{1}\times B_{2}}|\sigma_{ij}|$ , where the dependency on $k$ and $B_{1}$ is implicit. Thanks to Condition 4, we have $|\sigma_{i^{*}j^{*}}|\geq an^{-1/4}$ . Then, using Lemma 1,

[TABLE]

by Bernstein’s inequality. ∎

Lemma 10.

Let $c>0$ . Let $\tilde{A}:=\{a_{0},\;a_{1},...,\;a_{L}\}$ such that $a_{0}=0,\;a_{L}=1,\;0<a_{l+1}-a_{l}<c/\sqrt{p}$ for all $l\in[0:L-1]$ . Then,

[TABLE]

Proof.

Thanks to Lemma 9, it suffices to show that, for $n$ large enough, there exists $l\in[0:L]$ such that $a_{l}\in[n^{-1/3},an^{-1/4}/3\lambda_{\sup}(1+\sqrt{y})^{2}]$ . By contradiction, let us assume that there does not exist such $l$ . Let $j\in[0:L]$ such that $a_{j}<n^{-1/3}$ and $a_{j+1}>an^{-1/4}/3\lambda_{\sup}$ . Thus, we have

[TABLE]

which is in contradiction with the definition of $\tilde{A}$ .

∎

Lemma 11.

We have,

[TABLE]

Proof.

Let $\mathcal{P}_{p}(m)$ be the set of the partitions of $[1:p]$ such that all their elements have cardinal smaller than $m$ . By assumption (Condition 3), $B^{*}\in\mathcal{P}_{p}(m)$ . Let $G:=\{l/p|\;l\in[0,p]\}$ . Thus $G$ verifies the assumption of $\tilde{A}$ in Lemma 10, so

[TABLE]

Thus

[TABLE]

To conclude, it suffices to prove that $\left\{B_{\lambda},\;\lambda\in G_{s}\right\}\cap\mathcal{P}_{p}(m)=\left\{B_{\lambda},\;\lambda\in A_{s}\right\}$ .

We have immediately $\left\{B_{\lambda},\;\lambda\in A_{s}\right\}\subset\left\{B_{\lambda},\;\lambda\in G\right\}\cap\mathcal{P}_{p}(m)$ . We have to prove the other inclusion. Assume that $B\in\left\{B_{\lambda},\;\lambda\in G\right\}\cap\mathcal{P}_{p}(m)$ . We know that there exists $\lambda=l/p\in G$ such that $B=B_{\lambda}$ . As $B_{l/p}\in\mathcal{P}_{p}(m)$ , we know by definition of $s$ that $l\geq s$ and thus $\lambda\in A$ . ∎

Now, we prove Proposition 7.

Proof.

•

Using Lemma 9, Proposition 2, and the fact that $\{B_{\lambda}\;|\;\lambda\in A_{\widehat{C}}\}=\{B_{\lambda}\;|\;\lambda\in[0,1[\}$ , we have $\mathbb{P}\left(\widehat{B}_{\widehat{C}}=B^{*}\right)\longrightarrow 1$ .

•

Using Lemma 9 and Proposition 2, we have $\mathbb{P}\left(\widehat{B}_{\lambda}=B^{*}\right)\longrightarrow 1$ .

•

Using Lemma 11 and Proposition 2, we have $\mathbb{P}\left(\widehat{B}_{s}=B^{*}\right)\longrightarrow 1$ .

∎

**Proof of Proposition 8

**

Proof.

We follow the proof of Lemma 9.

Step 1: $B_{n^{-\delta/2}}\leq B(\alpha_{2})$ with probability which goes to 1.

[TABLE]

using Lemma 1 and Bernstein’s inequality.

Step 2: $B_{n^{-\delta/2}}\geq\mathcal{B}(\alpha_{1})$ with probability which goes to 1.

For all $k\in[1:K(\alpha_{1})]$ , and all $\emptyset\varsubsetneq B_{1}\varsubsetneq B_{k}(\alpha_{1})$ , let $B_{2}:=B_{k}(\alpha_{1})\setminus B_{1}$ and $(i^{*},j^{*}):=\operatorname{arg\,max}_{(i,j)\in B_{1}\times B_{2}}|\sigma_{ij}|$ , where the dependency on $k$ and $B_{1}$ is implicit. Then, using Lemma 1,

[TABLE]

by Bernstein’s inequality.

∎

**Proof of Proposition 9

**

Proof.

First, we prove the results for $\widehat{\Sigma}_{B^{*}}$ . We have, using again the notation $M\sim\mathcal{N}(0,\Sigma)$ ,

[TABLE]

By Markov’s inequality, that proves

[TABLE]

Now, we want to prove that

[TABLE]

First, we have

[TABLE]

Now, the variance is

[TABLE]

Now,

[TABLE]

Remark that if $A_{1},...,A_{d}$ are random variables, we have

[TABLE]

Thus

[TABLE]

Let $i,j\in B_{k}^{*}$ for some $k$ . We want to upper-bound $\operatorname{Var}\left((\widehat{\sigma}_{ij}-\sigma_{ij})^{2}\right)$ . Let us define $a_{k}:=X_{i}^{(k)}X_{j}^{(k)}-\sigma_{ij}$ . We know that

[TABLE]

So, using the independence of $a_{1},...,a_{n}$ , we obtain

[TABLE]

where we observed that $\operatorname{cov}(a_{1}a_{2},a_{1},a_{3})=0$ . Now, by Isserlis’ theorem and using the fact that $\sigma_{ij}$ is upper-bounded by $\lambda_{\sup}$ , we have $\operatorname{cov}\left(a_{1}^{2},a_{1}^{2}\right)\leq C_{\sup}$ , $\operatorname{cov}\left(a_{1}^{2},a_{1}a_{2}\right)\leq C_{\sup}$ and $\operatorname{cov}\left(a_{1}a_{2},a_{1}a_{2}\right)\leq C_{\sup}$ (and these bounds do not depend on $k,i,j$ ). So

[TABLE]

Thus,

[TABLE]

and

[TABLE]

Thus, by Chebyshev’s inequality

[TABLE]

So, we proved that $\frac{1}{p}\|\widehat{\Sigma}_{B^{*}}-\Sigma\|_{F}^{2}$ is not an $o_{p}(1/n)$ .

Now, we show that the same results hold for $S_{B^{*}}$ proving that $\frac{1}{p}\|S_{B^{*}}-\Sigma\|_{F}^{2}-\frac{1}{p}\|\widehat{\Sigma}_{B^{*}}-\Sigma\|_{F}^{2}=o_{p}(1/n)$ . We have

[TABLE]

Yet, by Bernstein’s inequality,

[TABLE]

and

[TABLE]

That proves

[TABLE]

Now, on the one hand, we have

[TABLE]

and by Proposition 7,

[TABLE]

On the other hand,

[TABLE]

∎

Proof of Proposition 10

Proof.

It suffices to prove that

[TABLE]

First,

[TABLE]

Secondly,

[TABLE]

∎

**Proof of Proposition 11

**

Proof.

We follow the proof of Proposition 9. Let $\delta\in]1/2,1[,\varepsilon>0$ , and $\alpha_{1}:=\delta/2-\varepsilon/4$ .

We have

[TABLE]

Thus,

[TABLE]

and thus

[TABLE]

We conclude using Proposition 8 and using that $B(\alpha_{2})\leq B^{*}$ . ∎

**Proof of Proposition 12

**

Proof.

The eigenvalues of $\Sigma$ are lower-bounded by $\varepsilon$ and upper-bounded by $mL$ , so $\Sigma$ verifies Condition 2. Condition 3 is verified by construction. It remains to prove the slightly modified Condition 4 given in Proposition 12. Let $a>0$ .

[TABLE]

using an union bound and the fact that all the blocks have a size larger that 10. Then, by independence of $\left(\sum_{l=1}^{L}U_{2k-1}^{(l)}U_{2k}^{(l)}\right)_{k\leq 5}$ , we have

[TABLE]

Let $U_{i}:=(U_{i}^{(l)})_{l\leq L}\in\mathbb{R}^{L}$ for $i=1,2$ . Then, $U_{1}$ and $U_{2}$ are independent and uniformly distributed on $[-1,1]^{L}$ . Thus

[TABLE]

Let $u_{2}\in[-1,1]^{L}\setminus\{0\}$ . The set $\{u_{1}\in[-1,1]^{L}|\;|\langle u_{1},u_{2}\rangle|\leq an^{-1/4}\}$ is a subset of $\{\sum_{l=1}^{L}x_{i}e_{i}|\;-an^{-1/4}\|u_{2}\|\leq x_{1}\leq an^{-1/4}\|u_{2}\|,\;|x_{l}|\leq\sqrt{L}\;\forall l\}$ where $e_{1}=u_{2}/\|u_{2}\|$ and $(e_{1},...,e_{L})$ is an orthonormal basis of $\mathbb{R}^{L}$ . The Lebesgue measure of this subset is $(2\sqrt{L})^{L-1}2an^{-1/4}\|u_{2}\|$ . Furthermore, (conditionally to $U_{2}=u_{2}$ ) the probability density function of $U_{1}$ on this set is either [math] or $2^{-L}$ . So, for all $u_{2}\in[-1,1]^{L}\setminus\{0\}$ ,

[TABLE]

Thus

[TABLE]

Then

[TABLE]

Hence, it remains to prove that the conclusion of Proposition 2 holds. That will imply the same for Propositions 7 and 9. Let $a>0$ and $E:=\{\Gamma\in S_{p}^{++}(\mathbb{R},B^{*})|\;\forall B<B^{*},\;\|\Sigma_{B}-\Sigma\|_{\max}\geq an^{-1/4}\}$ , where the generation of $B^{*}$ is defined in Proposition 12. We have

[TABLE]

Yet, for all $\Sigma\in E$ , $\mathbb{P}\left(\widehat{B}_{tot}\neq B^{*}|\;\Sigma=\Gamma\right)\longrightarrow 0$ thanks to Proposition 2 (even in Condition 4 is not verified, the proof is still valid since the covariance matrix is in $E$ ). We conclude by dominated convergence theorem. ∎

Notation for the proofs of Section 2.3

For all $i,j\in[1:p]$ , let $e_{i}\in\mathbb{R}^{p}$ be such that all coefficients are zero except the $i$ -th one which is equal to 1, and let $e_{ij}\in\mathcal{M}_{p}(\mathbb{R})$ be such that all coefficients are zero except the $(i,j)$ -th one which is equal to 1. Let $\gamma_{ij}$ be the $(i,j)$ -th coefficient of $\Sigma^{-1}$ . Finally, as we use matrices $M$ of size $p^{2}\times p^{2}$ , and vectors $v$ of size $p^{2}$ , we define $v_{ij}:=v_{(j-1)p+i}$ and $M_{ij,kl}:=M_{(j-1)p+i,(l-1)p+k}$ .

**Proof of Proposition 13

**We see that, for all $B\in\mathcal{P}_{p}$ , $l_{S_{B}}=\log(|S_{B}|)/p+\frac{n-1}{n}$ converges almost surely to $\log(|\Sigma_{B}|)/p+1$ . The following Lemma gives a central limit theorem for this convergence.

Lemma 12.

For all $B\in\mathcal{P}_{p}$ , we have

[TABLE]

with $2\operatorname{Tr}(\Sigma_{B}^{-1}\Sigma\Sigma_{B}^{-1}\Sigma)/p\leq 2p$ . In particular

[TABLE]

Proof.

Let $Z^{(k)}=M^{(k)}M^{(k)T}$ , where $M^{(k)}=(M_{i}^{(k)})_{i\leq p}\in R^{p}$ . We know that $\mathbb{E}(Z)=\Sigma$ and $\operatorname{cov}(Z_{i,j},Z_{k,l})=\mathbb{E}(X_{i}X_{j}X_{k}X_{l})-\sigma_{ij}\sigma_{kl}=\sigma_{ij}\sigma_{kl}+\sigma_{ik}\sigma_{jl}+\sigma_{il}\sigma_{jk}-\sigma_{ij}\sigma_{kl}=\sigma_{ik}\sigma_{jl}+\sigma_{il}\sigma_{jk}$ . Let $\Gamma\in\mathcal{M}_{p^{2},p^{2}}$ , be such that $\Gamma_{ij,kl}:=\sigma_{ik}\sigma_{jl}+\sigma_{il}\sigma_{jk}=\operatorname{cov}(Z_{i,j},Z_{k,l})$ . Using the central limit Theorem,

[TABLE]

and by Slutsky Lemma,

[TABLE]

where $(\Gamma_{B})_{ij,kl}=\Gamma_{ij,kl}$ if $(i,j)\in B$ and $(k,l)\in B$ and $(\Gamma_{B})_{ij,kl}=0$ otherwise.

Let us apply the Delta-method to (16) with the function $\log\circ\det\circ\operatorname{mat}$ , where $\operatorname{mat}=\mathbb{R}^{p^{2}}\rightarrow\mathcal{M}_{p}(\mathbb{R})$ is the inverse function of $\operatorname{vec}$ . If we write $L$ the Jacobian matrix of $\log\circ\det\circ\operatorname{mat}$ , we have:

[TABLE]

Let us compute the linear map $L(\operatorname{vec}(\Sigma_{B})):\mathbb{R}^{p^{2}}\rightarrow\mathbb{R}$ , that we identify with its matrix. Let us recall that, for the dot product $\langle A,B\rangle:=Tr(A^{T}B)$ , the gradient of $\log\circ\det$ on $A$ is $A^{-1}$ . Thus, if $v\in\mathbb{R}^{p^{2}}$ , we have

[TABLE]

So $L(\operatorname{vec}(\Sigma_{B}))=\operatorname{vec}(\Sigma_{B}^{-1})^{T}$ , then

[TABLE]

Now,

[TABLE]

Indeed, as $A:=\Sigma_{B}^{-\frac{1}{2}}\Sigma\Sigma_{B}^{-\frac{1}{2}}$ is symmetric positive definite, we have $\operatorname{Tr}(AA)\leq\operatorname{Tr}(A)^{2}$ . ∎

Lemma 13.

For all $\Gamma\in S_{p}^{++}(\mathbb{R})$ and for all $B\in\mathcal{P}_{p}$ such that $\Gamma\neq\Gamma_{B}$ , we have $\det(\Gamma_{B})>\det(\Gamma)$ .

Proof.

First, let us prove it for $|B|=K=2$ . We have $B=\{I,J\}$ .

[TABLE]

Now, $\det(\Gamma_{B})=\det(\Gamma_{I,I})\det(\Gamma_{J,J})$ . Thus, it suffices to show that $\det(\Gamma_{J,J})>\det(\Gamma_{J,J}-\Gamma_{J,I}\Gamma_{I,I}^{-1}\Gamma_{I,J})$ . We then write $A_{1}:=\Gamma_{J,J}-\Gamma_{J,I}\Gamma_{I,I}^{-1}\Gamma_{I,J}$ which is symmetric positive definite (Schur’s complement), and $A_{2}=\Gamma_{J,I}\Gamma_{I,I}^{-1}\Gamma_{I,J}$ which is also symmetric positive definite. Then, we have

[TABLE]

because $\det(I_{p}+A_{1}^{-\frac{1}{2}}A_{2}A_{1}^{-\frac{1}{2}})=\prod_{i=1}^{p}(1+\phi_{i}(A_{1}^{-\frac{1}{2}}A_{2}A_{1}^{-\frac{1}{2}}))$ .

Now, we prove the lemma for any value of $|B|=K$ . Let $\Gamma\in S_{p}^{++}(\mathbb{R})$ and $B\in\mathcal{P}_{p}$ such that $\Gamma\neq\Gamma_{B}$ . Let $B^{(j)}:=\{\bigcup_{i=1}^{j}B_{i},\bigcup_{i=j+1}^{K}B_{i}\}$ for all $j\in[1:K-1]$ . We now define $(\Gamma^{(j)})_{j\in[1,K]}$ with the recurrence relation $\Gamma_{(j+1)}=\Gamma^{(j)}_{B^{(j)}}$ and with $\Gamma^{(1)}=\Gamma$ , we then have $\Gamma_{B}=\Gamma^{K}$ . Thus

[TABLE]

Furthermore, as $\Gamma\neq\Gamma_{B}$ , there exists $j$ such that $\Gamma_{B^{(j)}}^{(j)}\neq\Gamma^{(j)}$ . Thus, at least one of the previous inequality is strict, and so $\det(\Gamma_{B})>\det(\Gamma)$ . ∎

Using Lemmas 12 and 13, we can prove Proposition 13.

Proof.

It suffices to show that, for all $B\neq B^{*}$ ,

[TABLE]

We split the proof into two steps: for $B\not\geq B^{*}$ and for $B>B^{*}$ .

Step 1: $B\not\geq B^{*}$ .

Let $h:=\min\{\log(|\Sigma_{B}|)-\log(|\Sigma|)|\;B\not\geq B^{*}\}=\min\{\log(|\Sigma_{B}|)-\log(|\Sigma|),\;B<B^{*}\}$ , since $\Sigma_{B}=\Sigma_{B\cap B^{*}}$ . Thanks to Lemma 13, we know that $h>0$ .

Let $B\not\geq B^{*}$ . Using the convergence in probability of $l_{S^{\prime}_{B}}$ , we know that $\mathbb{P}(l_{S_{B}}<\log|\Sigma_{B}|/p+1-h/3)\underset{n\rightarrow+\infty}{\longrightarrow}0$ and $\mathbb{P}(l_{S_{B^{*}}}>\log|\Sigma|/p+1+h/3)\underset{n\rightarrow+\infty}{\longrightarrow}0$ .

Now, we know that for $n>(3p/h)^{1/\delta}$ , the term of penalisation satisfies $\kappa\operatorname{pen}(B^{*})<h/3$ . Thus,

[TABLE]

Step 2: $B>B^{*}$ .

Let $B>B^{*}$ . We know that

[TABLE]

since $\Sigma_{B}=\Sigma_{B^{*}}$ for $B>B^{*}$ . Let $a_{n}$ be equal to $\sqrt{n}\kappa(\operatorname{pen}(B)-\operatorname{pen}(B^{*}))$ (which converges to $+\infty$ ), $b_{n}$ to be equal to $\sqrt{n}(l_{S_{B}}-l_{\Sigma_{B}})$ (which converges to a zero mean normal distribution) and $c_{n}$ to be equal to $\sqrt{n}(l_{S_{B^{*}}}-l_{\Sigma_{B^{*}}})$ (which converges to a zero mean normal distribution). We have

[TABLE]

Thus, $\mathbb{P}(\widehat{B}_{tot}=B)\underset{n\rightarrow+\infty}{\longrightarrow}0$ . ∎

**Proof of Proposition 14

**

Proof.

We follow the notation of [35].

An othonormal basis of $S_{p}(\mathbb{R})$ is $\{\frac{1}{\sqrt{2}}(e_{ij}+e_{ji})|\;i<j\}\cup\{e_{ii}|\;i\leq p\}$ with the following total order on $\{(i,j)\in[1:p]^{2}|\;i\leq j\}$ : we write $(i,j)\leq(i^{\prime},j^{\prime})$ if $j<j^{\prime}$ or if $j=j$ and $i\leq i^{\prime}$ . We define $U\in\mathcal{M}_{p^{2},p(p+1)/2}(\mathbb{R})$ as the matrix which columns are the vectorizations of the components of this basis of $\operatorname{vec}(S_{p}(\mathbb{R}))$ . Thus $U_{ij,kl}=\frac{1}{\sqrt{2}}(\mathds{1}_{(i,j)=(k,l)}+\mathds{1}_{(i,j)=(l,k)})$ , for all $k<l$ and $U_{ij,kk}=\mathds{1}_{(i,j)=(k,k)}$ .

Thus, $U(U^{T}JU)^{-1}U^{T}$ is the Cramér-Rao bound, where $J$ is the standard Fisher information matrix in the model $\{\mathcal{N}(\mu,\Sigma),\;\Sigma\in\mathcal{M}_{p}(\mathbb{R})\}$ . As the sample is i.i.d, it suffices to prove if with $n=1$ . In the rest of the proof, we compute the Cramér-Rao bound, and we show that this bound is equal to $\mathbb{E}\left((S-\Sigma)(S-\Sigma)^{T}\right)$ . We split the proof into several Lemmas.

Lemma 14.

Recall that $\Sigma^{-1}=(\gamma_{ij})_{i,j\leq p}$ . Let $A=(A_{mn,m^{\prime}n^{\prime}})_{m\leq n,m^{\prime}\leq n^{\prime}}\in\mathcal{M}_{p(p+1)/2}(\mathbb{R})$ defined by

[TABLE]

Then, $A=U^{T}JU$ .

Proof.

Deriving twice the log-likelihood with respect to $\sigma_{ij}$ and $\sigma_{kl}$ (for $i,j,k,l\in[1:p]$ ) and taking the expectation, we get

[TABLE]

Thus, for all $m<n,\;m^{\prime}<n^{\prime}$ , we have

[TABLE]

Now, if $m^{\prime}<n^{\prime}$ , we have

[TABLE]

If $m<n$ , we have

[TABLE]

Finally,

[TABLE]

∎

Lemma 15.

Let $B=(B_{mn,m^{\prime}n^{\prime}})_{m\leq n,m^{\prime}\leq n^{\prime}}\in\mathcal{M}_{p(p+1)/2}(\mathbb{R})$ defined by

[TABLE]

then, $B=A^{-1}$ . Moreover $(UBU^{T})_{ij,i^{\prime}j^{\prime}}=\sigma_{ii^{\prime}}\sigma_{jj^{\prime}}+\sigma_{ij^{\prime}}\sigma_{ji^{\prime}}$ for all $i,j,i^{\prime},j^{\prime}\in[1:p]$ .

Proof.

We compute the product $A\;B$ . First of all, let $m<n$ and $m^{\prime}<n^{\prime}$ . We have

[TABLE]

with

[TABLE]

and

[TABLE]

We then have

[TABLE]

Similarly,

[TABLE]

Now, if $m^{\prime}<n^{\prime}$

[TABLE]

If $m<n$ , then

[TABLE]

Finally,

[TABLE]

We proved that $B=A^{-1}$ . Let us show that $(U^{T}BU)_{ij,i^{\prime}j^{\prime}}=\sigma_{ii^{\prime}}\sigma_{jj^{\prime}}+\sigma_{ij^{\prime}}\sigma_{ji^{\prime}}$ for all $i,j,i^{\prime},j^{\prime}$ . First of all, assume $i\neq j$ and $i^{\prime}\neq j^{\prime}$ . Assume for example $i<j$ and $i^{\prime}<j^{\prime}$ . Then we have

[TABLE]

We apply the same method for $i<j$ and $i^{\prime}>j^{\prime}$ , for $i>j$ and $i^{\prime}<j^{\prime}$ , and for $i>j$ and $i^{\prime}>j^{\prime}$ . Then, let $i=j$ and $i^{\prime}\neq j^{\prime}$ , for example $i^{\prime}<j^{\prime}$ . We have

[TABLE]

The other cases are similar. ∎

We thus have the component of the Cramér-Rao bound:

[TABLE]

This matrix is equal to $\mathbb{E}\left((\operatorname{vec}(S)-\operatorname{vec}(\Sigma))(\operatorname{vec}(S)-\operatorname{vec}(\Sigma))^{T}\right)$ for $n=1$ and when the mean $\mu$ is known. ∎

**Proof of Proposition 15

**

Proof.

An orthonormal basis of $S_{p}(\mathbb{R},B^{*})$ is $\{\frac{1}{\sqrt{2}}(e_{ij}+e_{ji})|\;i<j,\;(i,j)\in B^{*}\}\cup\{e_{ii}|\;i\leq p\}$ with the following total order on $\{(i,j)\in[1:p]^{2}|\;i\leq j,\;(i,j)\in B^{*}\}$ : we write $(i,j)\leq(i^{\prime},j^{\prime})$ if $j<j^{\prime}$ or if $j=j$ and $i\leq i^{\prime}$ . Thus, we define $U$ as the matrix which the columns are the vectorizations of the components of this basis of $S_{p}(\mathbb{R},B^{*})$ . We have $U_{ij,kl}=\frac{1}{\sqrt{2}}(\mathds{1}_{(i,j)=(k,l)}+\mathds{1}_{(i,j)=(l,k)})$ , for all $k<l$ with $(k,l)\in B^{*}$ and $U_{ij,kk}=\mathds{1}_{(i,j)=(kk)}$ .

Thus, $U(U^{T}JU)^{-1}U^{T}$ is the Cramér-Rao bound. As the sample is i.i.d, it suffices to prove the proposition with $n=1$ .

Lemma 16.

Let $A=(A_{mn,m^{\prime}n^{\prime}})_{(m,n),\;(m^{\prime},n^{\prime})\in B^{*},\;m\leq n,\;m^{\prime}\leq n^{\prime}}$ defined by

[TABLE]

Then, $A=U^{T}JU$ .

Proof.

The proof is similar to the proof of Lemma 14, except that the values of $m,n,m^{\prime}$ and $n^{\prime}$ are more constraint. First of all

[TABLE]

Now, if $(m,n)\in B^{*},\;(m^{\prime},n^{\prime})\in B^{*},\;m<n,\;m^{\prime}<n^{\prime}$ ,

[TABLE]

If $m^{\prime}<n^{\prime}$ and $(m^{\prime},n^{\prime})\in B^{*}$ ,

[TABLE]

If $m<n$ and $(m,n)\in B^{*}$ , we have

[TABLE]

Finally,

[TABLE]

∎

Lemma 17.

Let $B=(B_{mn,m^{\prime}n^{\prime}})_{m\leq n,m^{\prime}\leq n^{\prime},\;(m,n)\in B^{*},(m^{\prime},n^{\prime})\in B^{*}}$ defined by

[TABLE]

then, $B=A^{-1}$ . Moreover $(UBU^{T})_{ij,i^{\prime}j^{\prime}}=\sigma_{ii^{\prime}}\sigma_{jj^{\prime}}+\sigma_{ij^{\prime}}\sigma_{ji^{\prime}}$ for all $(i,j,i^{\prime},j^{\prime})\in B^{*}$ and $(UBU^{T})_{ij,i^{\prime}j^{\prime}}=0$ for all $(i,j,i^{\prime},j^{\prime})\notin B^{*}$ . Recall that we write $(i,j,i^{\prime},j^{\prime})\in B^{*}$ if there exists $A\in B^{*}$ such that $\{i,j,i^{\prime},j^{\prime}\}\subset A$ .

Proof.

We introduce the following notation: if $l\in B_{k}^{*}$ , let $[l]_{k}$ to be the index of $l$ in $B_{k}^{*}$ .

Step 1: Let us prove that $B=A^{-1}$ .

We compute the product $AB$ . Assume that $m,n\in B_{k}^{*}$ with $m\leq n$ and $m^{\prime},n^{\prime}\in B_{k^{\prime}}^{*}$ with $m^{\prime}\leq n^{\prime}$ and $k\neq k^{\prime}$ . We then have

[TABLE]

using that $B_{ab,m^{\prime}n^{\prime}}=0$ if $a,b\in B_{k}^{*}$ and $m^{\prime},n^{\prime}\in B_{k^{\prime}}^{*}$ because $\Sigma$ is block-diagonal, and using that $A_{mn,a,b}=0$ if $m,n\in B_{k}^{*}$ and $a,b\in B_{k^{\prime}}^{*}$ because $\Sigma^{-1}$ is block-diagonal. Assume that $m,n,m^{\prime},n^{\prime}\in B_{k}^{*}$ with $m\leq n$ and $m^{\prime}\leq n^{\prime}$ . We have,

[TABLE]

thanks to Lemma 15 applied to the matrix $\Sigma_{B_{k}^{*}}$ . We proved that $B=A^{-1}$ .

Step 2.A : We show that $(UBU^{T})_{ij,i^{\prime}j^{\prime}}=\sigma_{ii^{\prime}}\sigma_{jj^{\prime}}+\sigma_{ij^{\prime}}\sigma_{ji^{\prime}}$ for all $(i,j,i^{\prime},j^{\prime})\in B^{*}$ .

Assume that $(i,j,i^{\prime},j^{\prime})\in B^{*}$ . First, assume that $i\neq j$ and $i^{\prime}\neq j^{\prime}$ . Assume for example that $i<j$ and $i^{\prime}<j^{\prime}$ (the other cases are similar). We then have

[TABLE]

Let us take the case where $(i,j,i^{\prime},j^{\prime})\in B^{*}$ with either $i=j$ , or $i^{\prime}=j^{\prime}$ . For example $i=j$ and $i^{\prime}<j^{\prime}$ . We then have

[TABLE]

It is the same for $i=j$ and $i^{\prime}>j^{\prime}$ , then for $i\neq j$ and $i^{\prime}=j^{\prime}$ . We also can prove the equality similarly when $i=i^{\prime}$ and $j=j^{\prime}$ .

Step 2.B: Let us prove that $(UBU^{T})_{ij,i^{\prime}j^{\prime}}=0$ for all $(i,j,i^{\prime},j^{\prime})\notin B^{*}$ .

Assume that $(i,j,i^{\prime},j^{\prime})\notin B^{*}$ . If $(i,j)\notin B^{*}$ , or if $(i^{\prime},j^{\prime})\notin B^{*}$ , we have

[TABLE]

because if $(i,j)\notin B^{*}$ , the term $(\mathds{1}_{(m,n)=(i,j)}+\mathds{1}_{(m,n)=(j,i)})$ is equal to 0. Similarly, if $(i^{\prime},j^{\prime})\notin B^{*}$ , the term $(\mathds{1}_{(m^{\prime},n^{\prime})=(i^{\prime},j^{\prime})}+\mathds{1}_{(m^{\prime},n^{\prime})=(j^{\prime},i^{\prime})})$ is equal to 0.

It remains the case where $i,j\in B_{k}^{*}$ and $i^{\prime},j^{\prime}\in B_{k^{\prime}}^{*}$ with $k\neq k^{\prime}$ . Then, $(U^{T}BU)_{ij,i^{\prime}j^{\prime}}=\sigma_{ii^{\prime}}\sigma_{jj^{\prime}}+\sigma_{ij^{\prime}}\sigma_{ji^{\prime}}=0$ . ∎

To conlude the proof, we remark that, if $(i,j,i^{\prime},j^{\prime})\in B^{*}$ , then

[TABLE]

Now, assume that $(i,j,i^{\prime},j^{\prime})\notin B^{*}$ . If $(i,j)\notin B^{*}$ or if $(i^{\prime},j^{\prime})\notin B^{*}$ , then $\operatorname{cov}((S_{B^{*}})_{ij},(S_{B^{*}})_{i^{\prime}j^{\prime}})=0$ because one of the two terms is zero. Assume that $i,j\in B_{k}^{*}$ and $i^{\prime},j^{\prime}\in B_{k^{\prime}}^{*}$ with $k\neq k^{\prime}$ . Then

[TABLE]

Thus, the covariance matrix of $\operatorname{vec}(S_{B^{*}})$ is equal to the Cramér-Rao bound. ∎

**Proof of Proposition 16

**

Proof.

Using the central limit Theorem and Proposion 15, we have

[TABLE]

Then, by Proposition 13, we have

[TABLE]

and by Slutsky,

[TABLE]

∎

**Proof of Proposition 17

**

Lemma 18.

Under Conditions 1 to 4, for all $\gamma>1/2$

[TABLE]

where $\|.\|_{2}$ is the operator norm, and it is equal to $\lambda_{max}(.)$ on the set of the symmetric positive semi-definite matrices.

Proof.

[TABLE]

Now, on the one hand,

[TABLE]

by Bernstein’s inequality. On the other hand,

[TABLE]

by Bernstein’s inequality. ∎

Lemma 19.

Under Conditions 1 to 5, for all $\gamma>\frac{1}{2}$ ,

[TABLE]

Proof.

We know that $\widehat{\beta}-\beta\sim\mathcal{N}\left(0,\sigma_{n}^{2}[(A^{T}A)^{-1}]_{-1,-1}\right)$ . To simplify notation, let $Q:=\frac{1}{n}A^{T}A$ . Remark that $Q_{1,1}=1,\;Q_{-1,1}=\frac{1}{n}\sum_{k=1}^{n}X^{(k)}$ and $Q_{-1,-1}=\frac{1}{n}\sum_{k=1}^{n}X^{(k)}X^{(k)T}$ . Now, we know that

[TABLE]

Thus, $\widehat{\beta}-\beta\sim\mathcal{N}\left(0,\frac{\sigma^{2}}{n}S^{-1}\right)$ .

Now, by Lemma 1,

[TABLE]

Let $\varepsilon>0$ and $\gamma>\frac{1}{2}$ . We have,

[TABLE]

∎

Lemma 20.

Under Conditions 1 to 5, for all $\gamma>1/2$ ,

[TABLE]

Proof.

We have

[TABLE]

by Lemmas 19 and 18. Now, with probability which goes to one $1$ , by Lemmas 1 and 19, we have $\widehat{\beta}^{T}S_{B^{*}}\widehat{\beta}/p\geq\lambda_{\inf}(1-\sqrt{y})^{2}\beta_{\inf}^{2}/2$ . Moreover, $\beta^{T}\Sigma\beta/p\geq\lambda_{\inf}\beta_{\inf}^{2}\geq\lambda_{\inf}(1-\sqrt{y})^{2}\beta_{\inf}^{2}/2$ . Thus, with probability which goes to one $1$ , we have

[TABLE]

∎

We can now prove Proposition 17

Proof.

Let $\tilde{\eta}_{i}$ be the estimator of $\eta_{i}$ obtained replacing $\Sigma$ by $S_{B^{*}}$ and $\beta$ by $\widehat{\beta}$ in Equations (6) and (7). For all $\varepsilon>0$ and $\gamma>1/2$ , we have

[TABLE]

The term $\mathbb{P}(\widehat{B}=B^{*})$ goes to 0 from Proposition 2. It remains to prove that

[TABLE]

For all $k\in[1:K]$ and all $u\subset B_{k}^{*}$ , let us write

[TABLE]

Let, for all $C\subset[1:p]$ , $C\neq\emptyset$ ,

[TABLE]

We then have $(\eta_{i})_{i\in B_{k}^{*}}=L\left((\alpha_{u})_{u\in B_{k}^{*}};B_{k}^{*}\right)$ et $(\tilde{\eta}_{i})_{i\in B_{k}^{*}}=L\left((\tilde{\alpha}_{u})_{u\in B_{k}^{*}};B_{k}^{*}\right)$ .

As $L(.;C)$ is linear, it is Lipschitz continuous from $(R^{2^{|C|}},\|.\|_{\infty})$ to $(R^{|C|},\|.\|_{\infty})$ , with constant $l_{|C|}$ (we can show that $l_{|C|}=2$ ). Let $l:=\max_{j\in[1:m]}l_{j}<+\infty$ (we have in fact $l=2$ ). We then have,

[TABLE]

It suffices to show that

[TABLE]

Now,

[TABLE]

The term $\max_{k\in[1:K]}\max_{u\subset B_{k}^{*}}V_{u}^{k}$ is bounded from Conditions 2 and 5 and $\left|\frac{p}{\tilde{V}}-\frac{p}{V}\right|=o_{p}(\log(n)^{\gamma}/\sqrt{n})$ thanks to Lemma 20. The term $\frac{p}{\tilde{V}}$ is bounded in probability using Lemma 20, Conditions 2 and 5. Thus, it suffices to show that $\max_{k\in[1:K]}\max_{u\subset B_{k}^{*}}\left|\tilde{V}_{u}^{k}-V_{u}^{k}\right|=o_{p}(\log(n)^{\gamma}/\sqrt{n})$ . We will use that the operator norm of a sub-matrix is smaller than the operator norm of the whole matrix.

For all $k\in[1:K]$ and $u\subset B_{k}^{*}$ , we have

[TABLE]

Thus, we obtain a sum a three terms, and we have to prove that each term is $o_{p}(\log(n)^{\gamma}/\sqrt{n})$ . The first term is $o_{p}(\log(n)^{\gamma}/\sqrt{n})$ thanks to Lemmas 1 and 19.

For the second term, $\|S_{B_{k}^{*}-u}-\Sigma_{B_{k}^{*}-u}\|_{2}\leq\|S_{B_{k}^{*}}-\Sigma_{B_{k}^{*}}\|_{2}$ so is $o_{p}(\log(n)^{\gamma}/\sqrt{n})$ from Lemma 18.

Finally, for the third term,

[TABLE]

which do not depend on $k$ and $u$ . Finally, remark that $\|\Sigma\|_{2}$ and $\|\Sigma^{-1}\|_{2}$ are bounded from Condition 2, that $\|S\|_{2}$ are $\|S^{-1}\|_{2}$ bounded in probability from Lemma 1, that $\|S-\Sigma\|_{2}=o_{p}(\log(n)^{\gamma}/\sqrt{n})$ from Lemma 18 and Proposition 2 and that

[TABLE]

Thus, we proved that

[TABLE]

∎

Proof of Proposition 18

Lemma 21.

Under Conditions 1, 2 and 3, for all penalization coefficient $\delta\in]0,1[$ and for all $\varepsilon>0$ ,

[TABLE]

where $\|.\|_{2}$ is the operator norm, and it is equal to $\lambda_{max}(.)$ on the set of the symmetric positive semi-definite matrices.

Proof.

Let $\alpha_{1}:=\delta/2-\varepsilon/4$ .

[TABLE]

that goes to 0 following the proof of 18. ∎

Lemma 22.

Under Conditions 1, 2, 3 and 5, for all penalization coefficient $\delta\in]0,1[$ and for all $\varepsilon>0$ ,

[TABLE]

Proof.

The proof is similar to the proof of Lemma 20. ∎

We now can prove Proposition 18.

Proof.

For all $B\in\mathcal{P}_{p}$ , we define $\tilde{\eta}(B)_{i}$ as the estimator of $\eta_{i}$ obtained replacing $B^{*}$ by $B$ , $\Sigma$ by $S_{B}$ and $\beta$ by $\widehat{\beta}$ in Equations (6) and (7). We also define $\widehat{\eta}_{i}:=\tilde{\eta}(B_{n^{-\delta/2}})_{i}$ .

[TABLE]

By Proposition 8, $\mathbb{P}(\{B(\alpha_{1})\leq B_{n^{-\delta/2}}\leq B^{*}\}^{c})\longrightarrow 0$ .

Finally, we prove that

[TABLE]

following the proof of Proposition 17. ∎

**Proof of Proposition 19

**

Proof.

Remark that $\Sigma$ verifies Conditions 1 to 3. Let $a>0$ . Let $\check{\Sigma}:=\Sigma$ if $\forall B<B^{*},\;\|\Sigma_{B}-\Sigma\|_{\max}\geq an^{-1/4}$ and $\check{\Sigma}=I_{p}$ otherwise. Let $\check{\eta}$ and $\check{\hat{\eta}}$ be defined as $\eta$ and $\widehat{\eta}$ in Proposition 17 but replacing $\Sigma$ by $\check{\Sigma}$ . As $\check{\Sigma}$ verify the Conditions 1 to 3 and the slightly modified Condition 4 given in Proposition 12, conditionally to $\check{\Sigma}$

[TABLE]

Thus, for all $\varepsilon>0$ ,

[TABLE]

so, by dominated convergence theorem,

[TABLE]

unconditionally to $\check{\Sigma}$ .

We conclude saying that $\check{\Sigma}=\Sigma$ with probability which converges to 1 from Proposition 12, so $\check{\hat{\eta}}=\widehat{\eta}$ and $\check{\eta}=\eta$ with probability which converges to 1. ∎

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bai, Z., and Silverstein, J. W. Spectral Analysis of Large Dimensional Random Matrices , 2 ed. Springer Series in Statistics. Springer-Verlag, New York, 2010.
2[2] Bickel, P. J., and Levina, E. Regularized estimation of large covariance matrices. The Annals of Statistics 36 , 1 (Feb. 2008), 199–227.
3[3] Broto, B., Bachoc, F., and Depecker, M. Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. ar Xiv:1812.09168 (2018).
4[4] Broto, B., Bachoc, F., Depecker, M., and Martinez, J.-M. Sensitivity indices for independent groups of variables. Mathematics and Computers in Simulation 163 (Sept. 2019), 19–31.
5[5] Bühlmann, P., and Van De Geer, S. Statistics for high-dimensional data: methods, theory and applications . Springer Science & Business Media, 2011.
6[6] Chastaing, G. Indices de Sobol généralisés pour variables dépendantes . phdthesis, Université de Grenoble, Sept. 2013.
7[7] Chastaing, G., Gamboa, F., Prieur, C., et al. Generalized hoeffding-sobol decomposition for dependent variables-application to sensitivity analysis. Electronic Journal of Statistics 6 (2012), 2420–2448.
8[8] Clouvel, L. Quantification de l’incertitude du flux neutronique rapide reçu par la cuve d’un réacteur à eau pressurisée . Ph D thesis, Nov. 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis

Abstract

1 Introduction

2 Estimation of block-diagonal covariance matrices

2.1 Problem and notation

Proposition 1**.**

2.2 Convergence in high dimension

2.2.1 Assumptions

Condition 1**.**

Condition 2**.**

Condition 3**.**

Condition 4**.**

2.2.2 Convergence of B^\widehat{B}B and reduction of the cost

Proposition 2**.**

Proposition 3**.**

Proposition 4**.**

Proposition 5**.**

Proposition 6**.**

Proposition 7**.**

Proposition 8**.**

2.2.3 Convergence of the estimator of the covariance matrix

Proposition 9**.**

Proposition 10**.**

Proposition 11**.**

2.2.4 Discussion about the assumptions

Proposition 12**.**

2.2.5 Numerical applications

2.3 Convergence and efficiency in fixed dimension

Proposition 13**.**

Corollary 1**.**

Proposition 14**.**

Remark 1**.**

Proposition 15**.**

Proposition 16**.**

3 Application to the estimation of the Shapley effects

3.1 The Shapley effects

3.2 Estimation of the Shapley effects in high dimension

Condition 5**.**

Proposition 17**.**

Proposition 18**.**

Remark 2**.**

3.3 Numerical application

Proposition 19**.**

4 Application on real data

4.1 The Shapley effects with nuclear data

4.2 Details of the nuclear data

4.3 Estimation of the Shapley effects

5 Conclusion

Acknowledgements

Appendix

Proof.

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Lemma 5**.**

Proof.

Proof.

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

Proof.

Proof.

Lemma 9**.**

Proof.

Lemma 10**.**

Proof.

Proposition 1.

Condition 1.

Condition 2.

Condition 3.

Condition 4.

2.2.2 Convergence of $\widehat{B}$ and reduction of the cost

Proposition 2.

Proposition 3.

Proposition 4.

Proposition 5.

Proposition 6.

Proposition 7.

Proposition 8.

Proposition 9.

Proposition 10.

Proposition 11.

Proposition 12.

Proposition 13.

Corollary 1.

Proposition 14.

Remark 1.

Proposition 15.

Proposition 16.

Condition 5.

Proposition 17.

Proposition 18.

Remark 2.

Proposition 19.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

Lemma 13.

Lemma 14.

Lemma 15.

Lemma 16.

Lemma 17.

Lemma 18.

Lemma 19.

Lemma 20.

Lemma 21.

Lemma 22.