Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis
Baptiste Broto (LADIS), Fran\c{c}ois Bachoc (IMT), Laura Clouvel,, Jean-Marc Martinez (DM2S)

TL;DR
This paper develops consistent estimators for block-diagonal covariance matrices in high-dimensional Gaussian data and applies them to efficiently estimate Shapley effects in sensitivity analysis, even with thousands of variables.
Contribution
It introduces new estimators for block-diagonal covariance matrices that are both consistent and efficient, enabling scalable sensitivity analysis in high dimensions.
Findings
Estimator converges at the same rate as if the true structure was known.
Estimator is asymptotically efficient in fixed dimension.
Allows estimation of Shapley effects for thousands of variables.
Abstract
In this paper, we aim to estimate block-diagonal covariance matrices for Gaussian data in high dimension and in fixed dimension. We first estimate the block-diagonal structure of the covariance matrix by theoretical and practical estimators which are consistent. We deduce that the suggested estimator of the covariance matrix in high dimension converges with the same rate than if the true decomposition was known. In fixed dimension , we prove that the suggested estimator is asymptotically efficient. Then, we focus on the estimation of sensitivity indices called "Shapley effects", in the high-dimensional Gaussian linear framework. From the estimated covariance matrix, we obtain an estimator of the Shapley effects with a relative error which goes to zero at the parametric rate up to a logarithm factor. Using the block-diagonal structure of the estimated covariance matrix, this estimator is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis
Baptiste Broto
CEA, LIST, Université Paris-Saclay, F-91120, Palaiseau, France
François Bachoc
Institut de Mathématiques de Toulouse, Université Paul Sabatier, F-31062 Toulouse, France
Laura Clouvel
CEA, SERMA, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France
Jean-Marc Martinez
CEA, DEN-STMF, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France
(March 1, 2024)
Abstract
In this paper, we estimate a block-diagonal covariance matrix from Gaussian variables in high dimension. We prove that, under some mild assumptions, we find the block-diagonal structure of the matrix with probability that goes to one. We deduce estimators of the covariance matrix that are as accurate as if the block-diagonal structure where known, with numerical applications. We also prove the asymptotic efficiency of one of these estimators in fixed dimension. Then, we apply these estimators for the computation of sensitivity indices, namely the Shapley effects, in the Gaussian linear framework. We derive estimators of the Shapley effects in high dimension with a relative error that converges to 0 at the parametric rate, up to a logarithm factor. Finally, we apply the Shapley effects estimators on nuclear data.
1 Introduction
Sensitivity analysis, and particularly sensitivity indices, have became important tools in applied sciences. The aim of sensitivity indices is to quantify the impact of the input variables on the output of a model. This information improves the interpretability of the model. In global sensitivity analysis, the input variables are assumed to be random variables. In this framework, the Sobol indices [33] were the first suggested indices to be applicable to general classes of models. Nevertheless, one of the most important limitations of these indices is the assumption of independence between the input variables. Hence, many variants of the Sobol indices have been suggested for dependent input variables [22, 6, 23, 7].
Recently, Owen defined new sensitivity indices in [26] called "Shapley effects". These sensitivity indices have many advantages over the Sobol indices for dependent inputs [16]. For general models, [34] suggested an estimator of the Shapley effects. However, this estimation requires to be able to generate samples with the conditional distributions of the input variables. A consistent estimator has been suggested in [3], requiring only a sample of the inputs-output. This estimator uses nearest-neighbours methods to mimic the generation of samples with these conditional distributions.
In this paper, we focus on Gaussian linear models in large dimension. Gaussian linear models are widely used as numerical models of physical phenomena (see for example [19, 13, 29]). Indeed, uncertainties are often modelled as Gaussian variables and an unknown function is commonly approximated by its linear approximation around . Furthermore, high-dimensional Gaussian linear models are widely studied in statistics [5, 11]. In this particular case of Gaussian linear models, the theoretical values of the Shapley effects can be computed explicitly [27, 16, 4]. These values depend on the covariance matrix of the inputs and on the coefficients of the linear model.
In this paper, we assume that we observe an i.i.d. sample of the input Gaussian variables in high dimension and that the true covariance matrix and the vector are unknown. In this setting, the Shapley effects need to be estimated, replacing the true vector by its estimation and the theoretical covariance matrix by an estimated covariance matrix.
There exists a fair amount of work on high-dimensional covariance matrix estimation. Many researchers took an interest in the empirical covariance matrix in high dimension [24, 37, 32, 1]. For particular covariance matrices, different estimators than the empirical covariance can be preferred. For some well-conditioned families of covariance matrices, [2] suggests a banded version of the empirical covariance matrix, and several works address the problem of estimating a sparse covariance matrix [14, 20, 10].
However, in general, given a high-dimensional covariance matrix, the computation cost of the corresponding Shapley effects grows exponentially with the dimension. The only setting where a procedure to compute the Shapley effect with a non-exponential cost is the setting of block-diagonal matrices [4]. Hence, in high dimension, block-diagonal covariance matrices are a very favorable setting for the estimation of the Shapley effects. Thus, we address the estimation of high-dimensional block-diagonal covariance matrices in this paper. In contrasts, we remark that the above methods are not relevant for the estimation of the Shapley effects, since they do not provide block-diagonal matrices.
In our framework, we assume that the true covariance matrix is block-diagonal and we want to estimate this matrix with a similar structure to compute the deduced Shapley effects. Some works address the block-diagonal estimation of covariance matrices. [28] gives a numerical procedure to estimate such covariance matrices and [15] suggests a test to verify the independence of the blocks. A block-diagonal estimator of the covariance matrix is proposed in [9]. The authors of [9] choose a more general framework, without assuming that the true covariance matrix is block-diagonal. They obtain the estimated block-diagonal structure by thresholding the empirical correlation matrix. They also give theoretical guaranties by bounding the average of the squared Hellinger distance between the estimated probability density function and the true one. This bound depends on the dimension and the sample size . When converges to some constant , this bound is larger than and is no longer relevant as the Hellinger distance is always smaller than .
Here, we focus on the high dimension setting, when converges to some constant , and when the true covariance matrix is assumed to be block-diagonal. We give different estimators of the block-diagonal structure and we show that their complexity is small. Then, we provide new asymptotic results for these estimators. Under mild conditions, we show that the estimators of the block structure are equal to the true block structure, with probability converging to one. Furthermore, the square Frobenius distance between the estimated covariance matrices and the true one, normalized by , converge to zero at rate . Thus, our work complements the one of [9]. We also study the fixed-dimensional setting, where we show that one of our suggested estimators is asymptotically efficient.
From the estimated block-diagonal covariance matrices, we deduce estimators of the Shapley effects in the high-dimensional linear Gaussian framework, with reduced computational cost. We recall that in high dimension, the computation of the Shapley effects requires that the corresponding covariance matrix be block-diagonal. We show that the relative estimation error of these estimators goes to zero at the parametric rate , up to a logarithm factor, even if the linear model is estimated from noisy observations.
Our convergence results are confirmed by numerical experiments. We also apply our algorithm to semi-generated data from nuclear applications.
The rest of the paper is organized as follows. In Section 2, we focus on the block-diagonal estimation of the block-diagonal covariance matrix. In Section 3, we apply this block-diagonal estimation of the covariance matrix to deduce Shapley effects estimators. Section 4 is devoted to the numerical application on nuclear data, and the conclusion is given in Section 5. All the proofs are postponed to the appendix.
2 Estimation of block-diagonal covariance matrices
2.1 Problem and notation
We assume that we observe , an i.i.d. sample with distribution , where and are not known. Here, denotes the set of the integers from 1 to . We assume that (the set of the symmetric positive definite matrices) and has a block-diagonal decomposition. To be more precise on this block-diagonal decomposition, we need to introduce some notation.
Let us write the set of all the partitions of . We endow the set with the following partial order. If , we say that is finer than , and we write , if for all , there exists such that . We also compare the elements of a partition with their smallest element; that enables us to talk about "the -th element" of . If and , we write if there exists such that (in other words, if are in the same group of ). If with and if , we define by
[TABLE]
Let us define
[TABLE]
where we define if and if . Thus and for all , we can define an unique such that . Here, we assume that , i.e. is the finest decomposition of , i.e. . We say that has a block-diagonal decomposition .
We also write
[TABLE]
and
[TABLE]
which are the empirical estimators of and . To simplify notation, we write for and for (the dependency on is implicit). We know that, for all , maximizes the likelihood over the mean parameter , where
[TABLE]
and is the determinant of . Thus, for all , we define
[TABLE]
As we assume that the true covariance matrix is block-diagonal, we consider a block-diagonal promoting penalization of the form
[TABLE]
if and for all . We consider the penalized likelihood criterion
[TABLE]
where . In this work, we suggest to estimate by the minimizer of , for some choice of penalisation . First, we show in Proposition 1 that a minimizer of can only be a block-diagonal decomposition of .
Proposition 1**.**
If is a minimizer of , then, there exists such that .
Hence, the minimization problem on becomes a minimization problem on the finite set . So, we define and we suggest to estimate by
[TABLE]
as the minimum structure of the penalized log-likelihood. In this paper, we study theoretically this estimator of . However, it is unimplementable in high dimension since the number of partitions is too large. Hence, we will also define other estimators less costly, and study them theoretically.
2.2 Convergence in high dimension
2.2.1 Assumptions
In Section 2.2, we assume that and go to infinity. The true covariance matrix is not constant and depends on (or ). Nevertheless, to simplify notation, we do not write the dependency on . In all Section 2.2, we choose a penalisation coefficient for a fixed .
We also add the following assumptions on along Section 2.2.
Condition 1**.**
.
Condition 2**.**
There exist and such that, for all , the eigenvalues of are in .
Condition 3**.**
There exists such that for all , all the blocks of are smaller than , i.e. , we have .
For a matrix , we let .
Condition 4**.**
There exists such that for all and for all , we have .
These four mild assumptions are discussed in Section 2.2.4. However, we also focus on the case when Condition 4 does not hold. We will provide similar results, both when assuming Conditions 1 to 4, and when only Conditions 1, 2 and 3 hold.
2.2.2 Convergence of and reduction of the cost
Now that we defined our estimator of the true decomposition in Equation (1) and we added assumptions in Section 2.2.1, we give the convergence of in Proposition 2. Although is not computable in practice, its convergence remains interesting to strengthen the choice of the penalized likelihood criterion and will be useful to prove the convergence of more practical estimators. In Section 2.2, all the limits statements are given as .
Proposition 2**.**
Under Conditions 1 to 4 and for a fixed , we have
[TABLE]
Hence, under Conditions 1 to 4, the estimator is equal to the true decomposition with probability which goes to one. When Condition 4 does not hold, we can not state such a convergence result but we get a weaker result in Proposition 3. In this case, we need to define as the partition given by thresholding by . In other words, is the smallest (or finest) partition such that .
Proposition 3**.**
Under Conditions 1, 2 and 3, for all and , we have
[TABLE]
Thus, we defined a consistent estimator of that theoretically solves our problem of the lack of knowledge of the true decomposition . However, computing is very costly in practice. Indeed, the number of partitions of (the Bell number) is exponential in . As in [9], we suggest to restrict our estimates of to the partitions given by thresholding the empirical correlation matrix where , with . If , let be the finest partition of the thresholded empirical correlation matrix . In other words, . For some value , can be found by "Breath-First-Search" (BFS) [21]. Furthermore, we do not need to compute for all and we suggest in the following three different choices of grids for .
First, we suggest the grid and we define the estimator . This grid is the finest one because that gives all the partitions . Almost surely, the coefficients are all different. Thus, when we increase the threshold to the next value of , we only remove two symmetric coefficients from the empirical correlation matrix.
Proposition 4**.**
The computational complexity of is .
Using the rate of convergence of the estimated covariances and by Condition 4, we then suggest the estimator , the partition of the empirical correlation matrix thresholded by . With this threshold, we can not find all the partitions given by thresholded correlation matrix, but we only have to threshold by only one value.
Proposition 5**.**
The complexity of is .
One can see that reducing the grid of thresholds to one value reduces the complexity of the estimator of . Finally, we suggest a third grid, in the c case when the maximal size of the groups is known.
Let , where is the smallest integer such that all the groups of have a cardinal smaller than . The deduced estimator is . So, this grid is the set restricted to the thresholds that give fine enough partition (with groups of size smaller than ).
Proposition 6**.**
The complexity of is .
One can see that the complexity of this estimator is as small as the complexity of the previous estimator . Furthermore, it ensures that the estimated blocks are not too large, which was not the case with the previous estimator. However, the computation of requires the knowledge of while the other estimators do not.
Now that we have defined new estimators of , we give their convergence in the following proposition.
Proposition 7**.**
Let be either or indifferently. Under Conditions 1 to 4 and for a fixed , we have
[TABLE]
When Condition 4 is not satisfied, we do not study the convergence of the previous estimators. In this case, we suggest to estimate by , which is the partition given by the empirical correlation matrix thresholded by . The complexity of this estimator is , as for the previous estimator . We show the convergence of this estimator in Proposition 8.
Proposition 8**.**
Under Conditions 1, 2 and 3, if and ,
[TABLE]
As Condition 4 is not satisfied, the true partition is again not reached by this estimator. Nevertheless, we get stronger results for the practical estimator than for the theoretical estimator when Condition 4 is not verified. Indeed, the condition "to be larger or equal than" is stronger that "not to be smaller than".
2.2.3 Convergence of the estimator of the covariance matrix
We have seen in Propositions 7 and 8 how to estimate the decomposition by . Now to estimate the covariance matrix , it suffices to impose the block-diagonal decomposition to the empirical covariance matrix . We show in Proposition 9 that the resulting block-diagonal matrix estimator reaches the optimal rate of convergence under Conditions 1 to 4.
Proposition 9**.**
Let be the Frobenius norm defined by . Let be either or . Under Conditions 1 to 4 and for a fixed , we have
[TABLE]
and
[TABLE]
Moreover, it is the best rate that we can have because
[TABLE]
Thus, we see that the quantity decreases to 0 in probability with rate , which is the same rate as if we know the true decomposition . Thus, the lack of knowledge of does not deteriorate the convergence of our estimator.
Now that we gave the rate of convergence of our estimator , we compare it with that of the empirical estimator in the next proposition.
Proposition 10**.**
Under Conditions 1 and 2, the rate of the empirical covariance is
[TABLE]
and we have
[TABLE]
So, we know that is lower-bounded in average and is bounded in probability. Thus, the rate of convergence of our suggested estimator is better than the empirical covariance matrix .
If Condition 4 does not hold, the rate of convergence is given in the following proposition.
Proposition 11**.**
Under Conditions 1, 2 and 3, for all and for all , we have
[TABLE]
We remark that, for close to , this rate of convergence almost reaches the optimal rate of , whereas the partition estimator does not reach the true decomposition . That comes from the fact that the elements of such that the indices are not in the estimated partition are small (with high probability). Hence, estimating these values by [math] does not increase so much the error .
Theoretical guaranties for a block-diagonal estimator of the covariance matrix are also provided in [9]. Their framework is more general, with a true covariance matrix which is not necessarily block-diagonal. They bound the average of the square Hellinger distance between the true normal density and the density with the block-diagonal estimated covariance matrix. However, when does not go to [math], their theoretical results becom uninformative. Indeed, they give an upper-bound which is larger than one, while the square Hellinger distance remains always smaller than .
2.2.4 Discussion about the assumptions
For the previous results, we needed to make four assumptions on (Conditions 1 to 4, given in Section 2.2.1).
Condition 1 provides a standard setting for high-dimensional problems, in particular for estimation of covariance matrices [24, 32]. Studying an higher dimensional setting where would be interesting in future work.
Condition 2 is needed to bound the operator norm of and and the eigenvalues of the empirical covariance matrix (with high probability). It also enables to bound the diagonal terms of , which allow to derive the rate of convergence of each component of the empirical covariance (using in particular Bernstein’s inequality, see the proofs for more details).
Condition 3 states that the blocks of the true decomposition have a maximal size. It implies that the number of non-zero terms of is .
Condition 4 requires that a finer block decomposition is not too close to the true . This condition is needed to not confuse with a finer decomposition. However, Condition 4 seems to be less mild than the others. That is why we also focus on the case when Condition 4 is not satisfied.
Nevertheless, even Condition 4 is not so restrictive. Indeed, we suggest in Proposition 12 a reasonable example where is randomly generated and where a condition similar to Condition 4 holds.
Proposition 12**.**
Let and . Assume that for all , is generated in the following way:
- •
Let be a partition of such that all its elements have a cardinal between and . Let be the number of groups (the cardinal of ). For all , let be the cardinal of the "-th element" of .
- •
For all , let be i.i.d. with distribution . Let such that the coefficient is . Let , where is the sub-matrix of indexed by the elements of .
- •
Let for all .
Then, Conditions 2 and 3 are verified and the following slightly modified version of Condition 4 is satisfied for all :
[TABLE]
Thus, if , the conclusions of Propositions 2, 7 and 9 remain true when the probabilities are defined with respect to and which distribution conditionally to is .
2.2.5 Numerical applications
We present here numerical applications of the previous results with simulated data. We generate a covariance matrix as in Proposition 12 with blocks of random size distributed uniformly on , with and . We assume here that we know that the maximal size of the block is , so we can use the estimator given in Proposition 7 to reduce the complexity to and to prevent the blocks from being too large.
We plot in Figure 1 the Frobenius norm of the error of the empirical covariance matrix and the Frobenius norm of the error of the suggested estimator , with for different values of . We can remark that the error of is in (where is the number of groups) whereas the error of stays bounded as in Proposition 9. For , the Frobenius error of on Figure 1 is about 10 times smaller than the one of .
2.3 Convergence and efficiency in fixed dimension
In this section, and are fixed and goes to . We choose a different penalisation with (instead of in the previous setting). This framework enables to study the efficiency of estimators of . Contrary to the high-dimensional setting of Section 2.2, we do not assume particular condition in addition to the ones given in Section 2.1.
We first give the convergence of defined in Equation (1) in the next proposition.
Proposition 13**.**
We have
[TABLE]
Corollary 1**.**
Let , where as in Proposition 7. Then
[TABLE]
In the rest of Section 2.3, we write for or . The aim of this framework is to show that the suggested estimator is asymptotically efficient as if the true decomposition were known.
As the parameter is in the set or even , which are not open subsets of , the classical Cramér-Rao bound is no longer a lower-bound for the estimation error. Furthermore, as is not known, the number of parameters of is not constant. That is why the classical Cramér-Rao bound is not relevant in our setting. We remark that applying this classical Cramér-Rao bound to a subset of the matrix estimator does not solve this problem.
A specific Cramér-Rao bound is suggested in [35] for parameters and estimators which satisfy continuously differentiable constraints. We shall consider linear constraints here. We let be the parameter, that is assumed to be restricted to a linear subspace of dimension in . In this case, if is a matrix whose columns are the elements of an orthonormal basis of and if is the Fisher Information Matrix (FIM) of in the non-constraint case, [35] states that for unbiased estimator , we have
[TABLE]
where is the partial order on the symmetric positive semi-definite matrices.
In our setting, remark that is an open subset of the linear subspace of symmetric matrices and is an open subset of the linear subspace . We let be the column vectorization of . Hence, the parameter is and there are linear constraints arising from the symmetry and linear constraints arising from the block structure .
So, the Cramér-Rao bound of Equation (2) is adapted to our framework, by considering the parameter , and we say that an estimator is efficient if it reaches the Cramér-Rao bound (2) (meaning that there is an equality in this equation), where the constraints (symmetry only or symmetry and block structure) will be stated explicitly.
Proposition 14 states that, in general, the empirical covariance matrix is efficient with this Cramér-Rao bound. This supports this choice of Cramér-Rao Bound, since in fixed dimension, one would expect that the empirical matrix is the most appropriate estimator.
If the empirical covariance matrix did not reach the Cramér-Rao Bound, we could not hope that would be efficient in the model where was known, and this Cramér-Rao bound would not be well tuned to our problem.
Proposition 14**.**
If is known, the empirical estimator is an efficient estimator of in the model .
Remark 1**.**
In Proposition 14, we assume that is known to reach the Cramér-Rao bound for fixed (and not only asymptotically). This will be the same in Proposition 15.
Now, we deduce the efficiency of when is known.
Proposition 15**.**
If and are known, is an efficient estimator of in the model .
Finally, Proposition 16 states the asymptotic efficiency of our estimator (even for unknown )
Proposition 16**.**
[TABLE]
where is the Cramér-Rao bound of in the model .
The explicit expression of the matrix can be found in the appendix where Propositions 14, 15 and 16 are proved.
3 Application to the estimation of the Shapley effects
In this section, we apply the block-diagonal estimation of the covariance matrix to estimate the Shapley effects in high dimension and for Gaussian linear models. In Section 3.1, we recall the definition of the Shapley effects with their particular expression in the Gaussian linear framework with a block-diagonal covariance matrix. In Section 3.2, we address the problem of estimating the Shapley effects when the covariance matrix is estimated. We derive the convergence of the estimators of the Shapley effects from the results of Section 2.
3.1 The Shapley effects
Let be random inputs variables on and let be the real random output variable in . We assume that . Here, can be a numerical simulation model [31].
If and , we write . We can define the Shapley effects as in [26] for the input variable as:
[TABLE]
where is the set . One can see in Equation (3) that adding a to changes the conditional expectation of , and increases the variability of this conditional expectation. The Shapley effect is large when, on average, the variance of this conditional expectation increases significantly when is observed. Thus, a large Shapley effect corresponds to an important input variable .
The Shapley effects have interesting properties for global sensitivity analysis. Indeed, there is only one Shapley effect for each variable (contrary to the Sobol indices). Moreover, the sum of all the Shapley effects is equal to (see [26]) and all these values lie in even with dependent inputs. This is very convenient for the interpretation of these sensitivity indices.
Here, we assume that , that and that the model is linear, that is , for a fixed and a fixed vector . This framework is widely used to model physical phenomena (see for example [19, 13, 29]). Indeed, uncertainties are often modelled as Gaussian variables and an unknown function is commonly estimated by its linear approximation. Furthermore, the main focus on this paper is on the high-dimensional case, where is large. In high dimension, linear models are often considered, as more complex models are not necessarily more relevant. In this framework, the sensitivity indices can be calculated explicitly [27]:
[TABLE]
with
[TABLE]
where and . Thus, in the Gaussian linear framework, the Shapley effects are functions of the parameters and .
Despite the analytical formula (5), even in the case where and are known, the computational cost of the Shapley effects remains an issue when the number of input variables is too large (), as it is highlighted in [4]. Indeed, the Shapley effects depend on values, namely the . However, when the covariance matrix is block-diagonal, [4] showed that this high-dimensional computational problem boils down to a collection of lower dimensional problems.
Indeed, assume that with . If , let denotes the group of , that is . Using Corollary 2 of [4], we have for all ,
[TABLE]
where for all ,
[TABLE]
and where . Thus, when and are known, to compute all the Shapley effects , we only have to compute the values instead of all the values . Some numerical experiments highlighting this gain are given in [4]. The complexity of the computation of the Shapley effects is , where denotes the size of the maximal group in .
If is known, but the decomposition is unknown, we can compute from . We can for example use "Breath-First-Search" (BFS). The complexity of this algorithm is in .
To conclude, when the parameters and are known with , the computation of all the Shapley effects has a complexity .
3.2 Estimation of the Shapley effects in high dimension
We now address the problem when the parameters , and thus are unknown.
We assume that we just observe a sample where are noisy observations:
[TABLE]
for where are i.i.d. with distribution and where is unknown, where is a fixed finite constant.
Remark that the computation of the Shapley effects requires the parameters and (see Equations (4) and (5) or (6) and (7)). Here, as we do not know the parameters and , we will estimate them and replace the true parameters by their estimation in Equations (4) and (5) or (6) and (7).
First, we estimate as usual by
[TABLE]
where is defined by and , and where .
At first glance, we could estimate by the empirical covariance matrix and replace it in the computation of the Shapley effects given by Equations (4) and (5) or (6) and (7). However, is not known and we can not find it using BFS with the empirical covariance matrix (which usually has the simple structure with probability one). Thus, we can not use the formula (6) of the Shapley effects with independent groups. So, the only way to estimate the Shapley effects is using Equations (4) and (5), replacing by the empirical covariance matrix . However, as we have seen, the complexity of this computation would be exponential in and it would be no longer tractable for . Furthermore, in high dimension, the Frobenius error between and does not go to [math] (see Proposition 9). Thus, using the empirical covariance matrix could yield estimators of the Shapley effects that do not converge.
For that reason, to estimate , we suggest to estimate by (defined in Section 2.2.2) and by and to replace them in the analytical formula (6). We write the estimator of the Shapley effects obtained replacing by , by and by in Equations (6) and (7). We use our previous results on the estimation of the covariance matrix to obtain the convergence rate of .
We focus on the high-dimensional case, when and go to . In this case, and are not fixed but depend on (or ). As in Section 2.2, we choose with to compute . To prevent problematic cases, we also add an assumption on the vector .
Condition 5**.**
There exist and such that for all and for all , we have .
Proposition 17**.**
Under Conditions 1 to 5 and if , then for all , we have
[TABLE]
Recall that . Thus, to quantify the error estimation, the value of is a relative error. Proposition 17 states that this relative error goes to zero at the parametric rate , up to a logarithm factor.
We have seen in Section 3.1 that, once we have the block-diagonal covariance matrix, the computation of the Shapley effects has the complexity which is equal to under Condition 3. In Section 2.2, we gave four different choices of , with four different complexities, all larger than . Thus, the complexity of the whole estimation of the Shapley effects (including the estimation of ) is the same as the complexity of (see Section 2.2.2).
When Condition 4 is not satisfied, we still have the convergence of the relative error, with almost the same rate.
Proposition 18**.**
Under Conditions 1, 2, 3 and 5, for all , choosing the partition and for all , we have
[TABLE]
Remark 2**.**
When the dimension is fixed, the rate of convergence is , as if we estimated by the empirical covariance matrix. Moreover, we have seen in Proposition 16 that the computation of enables to reach asymptotically the Cramér-Rao bound of [35] as if were known. We then deduce the asymptotic efficiency of . If we define , let be the Cramér-Rao bound of in the model . Thus,
[TABLE]
3.3 Numerical application
We have seen in Proposition 12 a way to generate which verifies Conditions 1 to 3 and some slightly modified version of Condition 4. So, with this choice of , we derive in Proposition 19 the convergence of the Shapley effects estimation.
Proposition 19**.**
Under Condition 5, if is generated as in Proposition 12, then, for all ,
[TABLE]
where the probabilities are defined with respect to and , which distribution conditionally to is .
We now present a numerical application of Proposition 19. The matrix is generated by Proposition 12 as in Section 2.2.5, with blocks of random size distributed uniformly on , and . For all , the vector is generated with distribution , so that Condition 5 is satisfied. As in Section 2.2.5, we assume that we know that the maximal size of the block is , so we can use the estimator given in Proposition 7. As the computation of the Shapley effects is exponential in the maximal block size, the estimator is preferred. The complexity of the estimation of the Shapley effects is then in .
We plot in Figure 2 the sum of the Shapley effects estimation error , with for different values of . We can remark that the sum of the errors seems to be or order , which is confirmed by Proposition 19.
4 Application on real data
In this section, we consider a real application of our suggested estimators of block-diagonal covariance matrices and of the Shapley effects, to nuclear data.
4.1 The Shapley effects with nuclear data
Uncertainty propagation methods are increasingly being used in nuclear calculation (neutron dosimetry, reactor design, criticality assessments, etc.) to deduce the accuracy of safety parameters (fast fluence, reactivity coefficients, criticality, etc.) and to establish safety margins. In fact, the resulting output of a nuclear computer model can be considered with a random portion as far as the inputs are uncertain. In this context, sensitivity analysis evaluates the impact of input uncertainties in terms of their relative contributions to the uncertainty in the output. Therefore, it helps to prioritize efforts for uncertainty reduction, improving the quality of the data.
Of particular interest for us, the uncertain inputs, in nuclear applications, tend to be correlated because of the measurement processes and the different calculations made to obtain the variables of interest from the observable quantities. This is why the Shapley effects are particularly convenient as sensitivity indices in nuclear uncertainty quantification. Moreover, these uncertain inputs are easily modeled as a Gaussian vector and the output is often modeled as a linear function of the inputs [19]. Thus, the Shapley effects can be computed or estimated, as it is described in Section 3.
4.2 Details of the nuclear data
In this application, the output is the neutron flux which is a quantity of interest in nuclear studies. For example, it can be calculated to evaluate the vessel neutron irradiation which is in fact one of the limiting factors for pressurized water reactor (PWR) lifetime. The quality of radiation damage prediction depends in part on the calculation of the fast neutron flux for energy larger than ( means electron-volt). In that sense, a lack of knowledge on the fast neutron flux will require larger safety margins on the plant lifetime affecting operating conditions and the cost of nuclear installations. To make correct decisions when designing the plant lifetime and on safety margins for PWRs, it is therefore essential to assess the uncertainty in vessel flux calculations.
One of the major sources of uncertainties in fast flux calculations are the cross sections which are used to characterise the probability that a neutron will interact with a nucleus and are the inputs of our model. They are expressed in barn, where 1 . The values of the cross sections and their uncertainties are provided by international libraries as the American Library ENDF/B-VII [25], the European library JEFF-3 [17], and the Japan Library JENDL-4 [18]. Using the standardized format, each cross section is defined for an isotope of the target nuclei, an energy level of the target nuclei and a reaction number (see [25] for more informations on numbers).
We assume that if , then, for any . Thus, the covariance is block-diagonal, where each block corresponds to a value of . Here, we have 292 input variables divided in 50 groups of size between 2 and 18. Using reference data, [8] has shown that the perturbation of the cross sections of the 56Fe, 1H, 16O isotopes are linearly related to the perturbation of the flux:
[TABLE]
Thus, if and are given, the Shapley effects are easily computable by Equations (6) and (7). We show the values of the Shapley effects in Figure 3.
We can remark that almost all the Shapley effects are close to 0. Now, we plot all the Shapley effects that are larger than on Figure 4 with the names of the corresponding cross sections. For example, "Fe56S4950050" means the cross section for the isotope , the reaction scattering 4 and a level of energy larger than (and smaller than 1353400).
We remark than only 23 cross sections have a Shapley effect larger than . The latter are associated with the lower energies (around 1 to 6 ). Moreover, they all come from three different groups of : (56Fe, scattering 4), (56Fe, scattering 2) and (1H, scattering 2).
In fact, in PWR reactors, fission mostly results from the absorption of slow neutrons by nuclei of high atomic number as the uranium 235U and the plutonium 239Pu which are the main fissile isotopes. The nucleus splits into two lighter nuclei, called fission fragments and often produces an average of 2.5 neutrons with an energy of about 1 or more. Figure 5 illustrates the neutron fission spectrum which defines the probability for a neutron to be emitted in the energy group by the isotope . One can note that there are more neutrons produced in the first energy groups between 1 to 6 . In that sense, those neutrons have a larger potential to interact with matter.
Moreover, most of the fast neutrons which escape from the core are scattered back by the reflector (essentially comprised of 56Fe) and slowed down by water (hydrogen and oxygen), which acts as a moderator, until they reach thermal equilibrium. The accuracy of the neutron flux received by the reflector is closely linked to the precision of the scattering cross sections.
On the other hand, we can notice that all the Shapley effects of the cross sections from H, scattering 2) are close, and that comes from the fact that the correlations between these different levels of energy are close to in this group.
Thus, when the true parameters and are known, we can compute the corresponding Shapley effects of the uncertainties of the cross sections on the uncertainty of the neutron flux . Furthermore, the interpretation of these Shapley effects is insightful and consistent with the available expert knowledge.
4.3 Estimation of the Shapley effects
In order to assess the efficiency of our suggested estimation procedures of the Shapley effects, we now assume that the true covariance matrix is unknown and that we observe an i.i.d. sample with distribution (with unknown). We assume that the maximal group size is known to be smaller or equal to and that the vector is known. Then, we estimate the block-diagonal structure by the block-diagonal structure that maximizes the penalized likelihood among all the block-diagonal structures obtained by thresholding the empirical correlation matrix from its largest value to the smallest value such that the maximal size of the blocks is smaller or equal to . Thus, our estimator is a mix of the estimators and detailed in Section 2.2.2.
We plot the Frobenius error of the estimated covariance matrix and the sum of the absolute values of the errors of the estimated Shapley effects for different values of in Figure 6, where .
We can remark that the errors decrease globally when the value of increases. The larger value of the sum of the errors of the estimated Shapley effects for is due to the randomness of the estimated Shapley effects. Note that, even when , the sum of the errors of the Shapley effects is less than (recall that, in comparison, the sum of the Shapley effects is ). We plot in Figure 7 the estimated Shapley effects that are larger than with . Remark that these estimated values are similar to the true ones displayed in Figure 4 and the physical interpretation is the same.
In conclusion, we implemented an estimator of the block-diagonal covariance matrix originating from nuclear data when we only observe an i.i.d. sample of the inputs. Then, the derived estimated Shapley effects are shown to be very close to the true Shapley effects, that quantify the impact of the uncertainties of cross sections on the uncertainty on the neutron flux. When the sample size is equal to , the physical conclusions are the same as when the true covariance matrix is known.
5 Conclusion
In this work, we suggest an estimator of a block-diagonal covariance matrix for Gaussian data. We prove that in high dimension, this estimator converges to the same block-diagonal structure with complexity in . For fixed dimension, we also prove the asymptotic efficiency of this estimator, that performs asymptotically as well as as if the true block-diagonal structure were known. Then, we deduce convergent estimators of the Shapley effects in high dimension for Gaussian linear models. These estimators are still available for thousands input variables, as long as the maximal block is not too large. Moreover, we prove the convergence of the Shapley effects estimators when the observations of the output are noisy and so the parameter is estimated. Finally, we applied these estimator on real nuclear data.
In future works, it would be interesting to treat the higher dimension setting when goes to .
Acknowledgements
We acknowledge the financial support of the Cross-Disciplinary Program on Numerical Simulation of CEA, the French Alternative Energies and Atomic Energy Commission. We would like to thank BPI France for co-financing this work, as part of the PIA (Programme d’Investissements d’Avenir) - Grand Défi du Numérique 2, supporting the PROBANT project. We thank Vincent Prost for his help.
Appendix
Notation
We will write for a generic non-negative finite constant (depending only on , and in Conditions 2 and 3). The actual value of is of no interest and can change in the same sequence of equations. Similarly, we will write for a generic strictly positive constant.
If , and , we will wite if and , that is, if and are in the same group with the partition and are in different groups with the partition .
If , we define as the maximal partition such that and .
If (the set of the matrices of dimension ), and if , we define and .
Recall that is defined by .
If (the set of the symmetric positive definite matrices) and , let be the -th largest eigenvalue of . We also write (resp. ) for the largest (resp. smallest) eigenvalue of .
We define , the unbiased empirical estimator of . Let be the coefficients of and be the coefficients of .
Recall that when Condition 4 does not hold, we need to define as the partition given by thresholding by . We also define and write .
Proof of Proposition 1
Proof.
Let us write
[TABLE]
which is the closure of in .
First, let us show that, for all , is the minimum of on . If , we have
[TABLE]
The function defined by has an unique minimum at . Thus, the function defined by has an unique minimum at . Thus has an unique minimum at .
Now, the penalisation term is constant on each . Thus has a global minimum (not necessary unique) at , for some . ∎
**Notation for Section 2.2
**Here and in all the proofs of Section 2.2, we assume Conditions 1 to 3 of Section 2.2.1.
In the following, we introduce some notation.
We know that
[TABLE]
where is the Wishart distribution with parameter and [12]. Thus, if we write i.i.d. with distribution , we have
[TABLE]
Lemma 1**.**
For all ,
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
Moreover, that holds also for instead of .
Proof.
Let i.i.d. with distribution . Using the result in [32] which states that
[TABLE]
we have,
[TABLE]
and
[TABLE]
Thus,
[TABLE]
The proof is the same for . ∎
We also verify the assumptions of Bernstein’s inequality (see for example Theorem 2.8.1 in [36]). For all , let
[TABLE]
The random-variables are independent, mean zero, sub-exponential and we have , where is the sub-exponential norm (for example, see Definition 2.7.5 in [36]). So, we can use Bernstein’s inequality with : there exists such that, for all and ,
[TABLE]
**Proof of Proposition 2
**
In this proof, we assume that Conditions 1 to 4 are satisfied. We first show several Lemmas.
Lemma 2**.**
For all symmetric positive definite and for all , if we write , we have:
- •
* decreases and so .*
- •
* increases and so .*
Proof.
Let us show that increases (the proof if the same for ).
For all , let , and an unit eigenvector of associated to . Let . Thus
[TABLE]
If we show that , we proved that increases. First, assume that . If we write the group of the largest eigenvalue of , then is equal to zero for all , so is equal to zero for all , and so is equal to zero.
Assume now that and let us show that by contradiction. Assume that . Then
[TABLE]
Furthermore, we have seen that . Thus, we have
[TABLE]
that is in contradiction with . ∎
In the following, let for all .
Lemma 3**.**
For all , we have
[TABLE]
Moreover, for all , we have
[TABLE]
Proof.
First, we prove Equation (10). Doing the Taylor expansion of and using the integral form of the remainder (as Equation (9) of [30] or in [20]), we have
[TABLE]
where is the Kronecker product. The trace is equal to zero. Now,
[TABLE]
using Lemma 2 for the two last steps.
Now, we prove Equation (11) similarly. We have, using Lemma 2,
[TABLE]
∎
Lemma 4**.**
[TABLE]
Proof.
Using Lemma 3, we have
[TABLE]
We show that the two terms go to [math]. The first term goes to [math] with Lemma 1. For the second term, we have
[TABLE]
using Bernstein’s inequality, where is defined in Equation (9). That concludes the proof. ∎
Lemma 5**.**
[TABLE]
Proof.
Using Lemma 3, we have
[TABLE]
The first term goes to 0 with Lemma 1. The second term is
[TABLE]
Now, for all and for all , let (with an implicit dependence on and ). Remark that
[TABLE]
Thus,
[TABLE]
Now, by Condition 4, we know that , so, for large enough,
[TABLE]
Thus, by Bernstein’s inequality, for large enough,
[TABLE]
∎
Now, we can prove Proposition 2.
Proof.
We have
[TABLE]
The two first terms go to 0 tanks to Lemmas 4 and 5. For the last term, we have
[TABLE]
These two last terms go to 0 thanks to Lemmas 4 and 5.
∎
Proofs of Proposition 3
In this proof, we assume that Conditions 1 to 3 hold.
Lemma 6**.**
For all , we have
[TABLE]
Moreover, for all , we have
[TABLE]
Proof.
Same proof as Lemma 3 replacing by . ∎
Lemma 7**.**
If , then,
[TABLE]
Proof.
Following the proof of Lemma 4 (and using Lemma 6), it is enough to prove that the following term goes to 0:
[TABLE]
using again Bernstein’s inequality. That concludes the proof. ∎
Lemma 8**.**
If , then,
[TABLE]
Proof.
Following the proof of Lemma 5, it suffices to prove that
[TABLE]
Now, by definition of , we know that , so, for large enough,
[TABLE]
Thus, by Bernstein’s inequality, for large enough,
[TABLE]
∎
We can now prove Proposition 3.
Proof.
We have
[TABLE]
First,
[TABLE]
from Lemma 8. Secondly,
[TABLE]
from Lemma 7. Finally,
[TABLE]
Proofs of Propositions 4, 5 and 6
Proof.
In the three cases, the computation of requires carrying out the BFS algorithm for and the computation of a determinant for . Recall that if is a graph (where is the set of vertices and the set of edges), the complexity of the BFS algorithm is . Recall that, if is a squared matrix of size , the complexity of is using the LU decomposition.
Now, we compute the complexity of the three estimators , and .
- •
For all , the complexity of is , and the cardinal of is . Thus, the complexity of the computation of is .
Now, for all , the complexity of is and the cardinal of is (because the function decreases). Thus, the complexity of the evaluations is
So the complexity of is .
- •
For the threshold , the complexity of is .
So the complexity of is .
- •
One can divide the computation of into two steps.
For the first step, as we do not know the value of , we have to compute from to , verifying each time if the maximal size of group is smaller than or not. First, for each value of from decreasing to , the complexity of the BFS algorithm to is , thus, the complexity of all these partitions if . Then, for , the complexity of is . So, the complexity of this first step is .
In the second step, we have to evaluate for all . The complexity of each evaluation is , and the the number of evaluations is . Thus, the complexity of this second step is .
∎
**Proof of Proposition 7
**
To prove the convergence of in the three cases, we need the three following Lemmas.
Lemma 9**.**
For all sequence such that for all , (we assume that is large enough and that subset is not empty), we have
[TABLE]
Proof.
Step 1: with probability which goes to 1.
[TABLE]
using Lemma 1 and Bernstein’s inequality.
Step 2: with probability which goes to 1.
For all , and all , let and , where the dependency on and is implicit. Thanks to Condition 4, we have . Then, using Lemma 1,
[TABLE]
by Bernstein’s inequality. ∎
Lemma 10**.**
Let . Let such that for all . Then,
[TABLE]
Proof.
Thanks to Lemma 9, it suffices to show that, for large enough, there exists such that . By contradiction, let us assume that there does not exist such . Let such that and . Thus, we have
[TABLE]
which is in contradiction with the definition of .
∎
Lemma 11**.**
We have,
[TABLE]
Proof.
Let be the set of the partitions of such that all their elements have cardinal smaller than . By assumption (Condition 3), . Let . Thus verifies the assumption of in Lemma 10, so
[TABLE]
Thus
[TABLE]
To conclude, it suffices to prove that .
We have immediately . We have to prove the other inclusion. Assume that . We know that there exists such that . As , we know by definition of that and thus . ∎
Now, we prove Proposition 7.
Proof.
- •
Using Lemma 9, Proposition 2, and the fact that , we have .
- •
Using Lemma 9 and Proposition 2, we have .
- •
Using Lemma 11 and Proposition 2, we have .
∎
**Proof of Proposition 8
**
Proof.
We follow the proof of Lemma 9.
Step 1: with probability which goes to 1.
[TABLE]
using Lemma 1 and Bernstein’s inequality.
Step 2: with probability which goes to 1.
For all , and all , let and , where the dependency on and is implicit. Then, using Lemma 1,
[TABLE]
by Bernstein’s inequality.
∎
**Proof of Proposition 9
**
Proof.
First, we prove the results for . We have, using again the notation ,
[TABLE]
By Markov’s inequality, that proves
[TABLE]
Now, we want to prove that
[TABLE]
First, we have
[TABLE]
Now, the variance is
[TABLE]
Now,
[TABLE]
Remark that if are random variables, we have
[TABLE]
Thus
[TABLE]
Let for some . We want to upper-bound . Let us define . We know that
[TABLE]
So, using the independence of , we obtain
[TABLE]
where we observed that . Now, by Isserlis’ theorem and using the fact that is upper-bounded by , we have , and (and these bounds do not depend on ). So
[TABLE]
Thus,
[TABLE]
and
[TABLE]
Thus, by Chebyshev’s inequality
[TABLE]
So, we proved that is not an .
Now, we show that the same results hold for proving that . We have
[TABLE]
Yet, by Bernstein’s inequality,
[TABLE]
[TABLE]
and
[TABLE]
That proves
[TABLE]
Now, on the one hand, we have
[TABLE]
and by Proposition 7,
[TABLE]
On the other hand,
[TABLE]
∎
Proof of Proposition 10
Proof.
It suffices to prove that
[TABLE]
First,
[TABLE]
Secondly,
[TABLE]
∎
**Proof of Proposition 11
**
Proof.
We follow the proof of Proposition 9. Let , and .
We have
[TABLE]
Thus,
[TABLE]
and thus
[TABLE]
We conclude using Proposition 8 and using that . ∎
**Proof of Proposition 12
**
Proof.
The eigenvalues of are lower-bounded by and upper-bounded by , so verifies Condition 2. Condition 3 is verified by construction. It remains to prove the slightly modified Condition 4 given in Proposition 12. Let .
[TABLE]
using an union bound and the fact that all the blocks have a size larger that 10. Then, by independence of , we have
[TABLE]
Let for . Then, and are independent and uniformly distributed on . Thus
[TABLE]
Let . The set is a subset of where and is an orthonormal basis of . The Lebesgue measure of this subset is . Furthermore, (conditionally to ) the probability density function of on this set is either [math] or . So, for all ,
[TABLE]
Thus
[TABLE]
Then
[TABLE]
Hence, it remains to prove that the conclusion of Proposition 2 holds. That will imply the same for Propositions 7 and 9. Let and , where the generation of is defined in Proposition 12. We have
[TABLE]
Yet, for all , thanks to Proposition 2 (even in Condition 4 is not verified, the proof is still valid since the covariance matrix is in ). We conclude by dominated convergence theorem. ∎
Notation for the proofs of Section 2.3
For all , let be such that all coefficients are zero except the -th one which is equal to 1, and let be such that all coefficients are zero except the -th one which is equal to 1. Let be the -th coefficient of . Finally, as we use matrices of size , and vectors of size , we define and .
**Proof of Proposition 13
**We see that, for all , converges almost surely to . The following Lemma gives a central limit theorem for this convergence.
Lemma 12**.**
For all , we have
[TABLE]
with . In particular
[TABLE]
Proof.
Let , where . We know that and . Let , be such that . Using the central limit Theorem,
[TABLE]
and by Slutsky Lemma,
[TABLE]
where if and and otherwise.
Let us apply the Delta-method to (16) with the function , where is the inverse function of . If we write the Jacobian matrix of , we have:
[TABLE]
Let us compute the linear map , that we identify with its matrix. Let us recall that, for the dot product , the gradient of on is . Thus, if , we have
[TABLE]
So , then
[TABLE]
Now,
[TABLE]
Indeed, as is symmetric positive definite, we have . ∎
Lemma 13**.**
For all and for all such that , we have .
Proof.
First, let us prove it for . We have .
[TABLE]
Now, . Thus, it suffices to show that . We then write which is symmetric positive definite (Schur’s complement), and which is also symmetric positive definite. Then, we have
[TABLE]
because .
Now, we prove the lemma for any value of . Let and such that . Let for all . We now define with the recurrence relation and with , we then have . Thus
[TABLE]
Furthermore, as , there exists such that . Thus, at least one of the previous inequality is strict, and so . ∎
Using Lemmas 12 and 13, we can prove Proposition 13.
Proof.
It suffices to show that, for all ,
[TABLE]
We split the proof into two steps: for and for .
Step 1: .
Let , since . Thanks to Lemma 13, we know that .
Let . Using the convergence in probability of , we know that and .
Now, we know that for , the term of penalisation satisfies . Thus,
[TABLE]
Step 2: .
Let . We know that
[TABLE]
since for . Let be equal to (which converges to ), to be equal to (which converges to a zero mean normal distribution) and to be equal to (which converges to a zero mean normal distribution). We have
[TABLE]
Thus, . ∎
**Proof of Proposition 14
**
Proof.
We follow the notation of [35].
An othonormal basis of is with the following total order on : we write if or if and . We define as the matrix which columns are the vectorizations of the components of this basis of . Thus , for all and .
Thus, is the Cramér-Rao bound, where is the standard Fisher information matrix in the model . As the sample is i.i.d, it suffices to prove if with . In the rest of the proof, we compute the Cramér-Rao bound, and we show that this bound is equal to . We split the proof into several Lemmas.
Lemma 14**.**
Recall that . Let defined by
[TABLE]
Then, .
Proof.
Deriving twice the log-likelihood with respect to and (for ) and taking the expectation, we get
[TABLE]
Thus, for all , we have
[TABLE]
Now, if , we have
[TABLE]
If , we have
[TABLE]
Finally,
[TABLE]
∎
Lemma 15**.**
Let defined by
[TABLE]
then, . Moreover for all .
Proof.
We compute the product . First of all, let and . We have
[TABLE]
with
[TABLE]
and
[TABLE]
We then have
[TABLE]
Similarly,
[TABLE]
Now, if
[TABLE]
If , then
[TABLE]
Finally,
[TABLE]
We proved that . Let us show that for all . First of all, assume and . Assume for example and . Then we have
[TABLE]
We apply the same method for and , for and , and for and . Then, let and , for example . We have
[TABLE]
The other cases are similar. ∎
We thus have the component of the Cramér-Rao bound:
[TABLE]
This matrix is equal to for and when the mean is known. ∎
**Proof of Proposition 15
**
Proof.
An orthonormal basis of is with the following total order on : we write if or if and . Thus, we define as the matrix which the columns are the vectorizations of the components of this basis of . We have , for all with and .
Thus, is the Cramér-Rao bound. As the sample is i.i.d, it suffices to prove the proposition with .
Lemma 16**.**
Let defined by
[TABLE]
Then, .
Proof.
The proof is similar to the proof of Lemma 14, except that the values of and are more constraint. First of all
[TABLE]
Now, if ,
[TABLE]
If and ,
[TABLE]
If and , we have
[TABLE]
Finally,
[TABLE]
∎
Lemma 17**.**
Let defined by
[TABLE]
then, . Moreover for all and for all . Recall that we write if there exists such that .
Proof.
We introduce the following notation: if , let to be the index of in .
Step 1: Let us prove that .
We compute the product . Assume that with and with and . We then have
[TABLE]
using that if and because is block-diagonal, and using that if and because is block-diagonal. Assume that with and . We have,
[TABLE]
thanks to Lemma 15 applied to the matrix . We proved that .
Step 2.A : We show that for all .
Assume that . First, assume that and . Assume for example that and (the other cases are similar). We then have
[TABLE]
Let us take the case where with either , or . For example and . We then have
[TABLE]
It is the same for and , then for and . We also can prove the equality similarly when and .
Step 2.B: Let us prove that for all .
Assume that . If , or if , we have
[TABLE]
because if , the term is equal to 0. Similarly, if , the term is equal to 0.
It remains the case where and with . Then, . ∎
To conlude the proof, we remark that, if , then
[TABLE]
Now, assume that . If or if , then because one of the two terms is zero. Assume that and with . Then
[TABLE]
Thus, the covariance matrix of is equal to the Cramér-Rao bound. ∎
**Proof of Proposition 16
**
Proof.
Using the central limit Theorem and Proposion 15, we have
[TABLE]
Then, by Proposition 13, we have
[TABLE]
and by Slutsky,
[TABLE]
∎
**Proof of Proposition 17
**
Lemma 18**.**
Under Conditions 1 to 4, for all
[TABLE]
where is the operator norm, and it is equal to on the set of the symmetric positive semi-definite matrices.
Proof.
[TABLE]
Now, on the one hand,
[TABLE]
by Bernstein’s inequality. On the other hand,
[TABLE]
by Bernstein’s inequality. ∎
Lemma 19**.**
Under Conditions 1 to 5, for all ,
[TABLE]
Proof.
We know that . To simplify notation, let . Remark that and . Now, we know that
[TABLE]
Thus, .
Now, by Lemma 1,
[TABLE]
Let and . We have,
[TABLE]
∎
Lemma 20**.**
Under Conditions 1 to 5, for all ,
[TABLE]
Proof.
We have
[TABLE]
by Lemmas 19 and 18. Now, with probability which goes to one , by Lemmas 1 and 19, we have . Moreover, . Thus, with probability which goes to one , we have
[TABLE]
∎
We can now prove Proposition 17
Proof.
Let be the estimator of obtained replacing by and by in Equations (6) and (7). For all and , we have
[TABLE]
The term goes to 0 from Proposition 2. It remains to prove that
[TABLE]
For all and all , let us write
[TABLE]
Let, for all , ,
[TABLE]
We then have et .
As is linear, it is Lipschitz continuous from to , with constant (we can show that ). Let (we have in fact ). We then have,
[TABLE]
It suffices to show that
[TABLE]
Now,
[TABLE]
The term is bounded from Conditions 2 and 5 and thanks to Lemma 20. The term is bounded in probability using Lemma 20, Conditions 2 and 5. Thus, it suffices to show that . We will use that the operator norm of a sub-matrix is smaller than the operator norm of the whole matrix.
For all and , we have
[TABLE]
Thus, we obtain a sum a three terms, and we have to prove that each term is . The first term is thanks to Lemmas 1 and 19.
For the second term, so is from Lemma 18.
Finally, for the third term,
[TABLE]
which do not depend on and . Finally, remark that and are bounded from Condition 2, that are bounded in probability from Lemma 1, that from Lemma 18 and Proposition 2 and that
[TABLE]
Thus, we proved that
[TABLE]
∎
Proof of Proposition 18
Lemma 21**.**
Under Conditions 1, 2 and 3, for all penalization coefficient and for all ,
[TABLE]
where is the operator norm, and it is equal to on the set of the symmetric positive semi-definite matrices.
Proof.
Let .
[TABLE]
that goes to 0 following the proof of 18. ∎
Lemma 22**.**
Under Conditions 1, 2, 3 and 5, for all penalization coefficient and for all ,
[TABLE]
Proof.
The proof is similar to the proof of Lemma 20. ∎
We now can prove Proposition 18.
Proof.
For all , we define as the estimator of obtained replacing by , by and by in Equations (6) and (7). We also define .
[TABLE]
By Proposition 8, .
Finally, we prove that
[TABLE]
following the proof of Proposition 17. ∎
**Proof of Proposition 19
**
Proof.
Remark that verifies Conditions 1 to 3. Let . Let if and otherwise. Let and be defined as and in Proposition 17 but replacing by . As verify the Conditions 1 to 3 and the slightly modified Condition 4 given in Proposition 12, conditionally to
[TABLE]
Thus, for all ,
[TABLE]
so, by dominated convergence theorem,
[TABLE]
unconditionally to .
We conclude saying that with probability which converges to 1 from Proposition 12, so and with probability which converges to 1. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bai, Z., and Silverstein, J. W. Spectral Analysis of Large Dimensional Random Matrices , 2 ed. Springer Series in Statistics. Springer-Verlag, New York, 2010.
- 2[2] Bickel, P. J., and Levina, E. Regularized estimation of large covariance matrices. The Annals of Statistics 36 , 1 (Feb. 2008), 199–227.
- 3[3] Broto, B., Bachoc, F., and Depecker, M. Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. ar Xiv:1812.09168 (2018).
- 4[4] Broto, B., Bachoc, F., Depecker, M., and Martinez, J.-M. Sensitivity indices for independent groups of variables. Mathematics and Computers in Simulation 163 (Sept. 2019), 19–31.
- 5[5] Bühlmann, P., and Van De Geer, S. Statistics for high-dimensional data: methods, theory and applications . Springer Science & Business Media, 2011.
- 6[6] Chastaing, G. Indices de Sobol généralisés pour variables dépendantes . phdthesis, Université de Grenoble, Sept. 2013.
- 7[7] Chastaing, G., Gamboa, F., Prieur, C., et al. Generalized hoeffding-sobol decomposition for dependent variables-application to sensitivity analysis. Electronic Journal of Statistics 6 (2012), 2420–2448.
- 8[8] Clouvel, L. Quantification de l’incertitude du flux neutronique rapide reçu par la cuve d’un réacteur à eau pressurisée . Ph D thesis, Nov. 2019.
