Qualitative robustness for bootstrap approximations
Katharina Strohriegl

TL;DR
This paper investigates the qualitative robustness of bootstrap approximations for non-i.i.d. data, extending existing theory to dependent processes like $eta$-mixing, and establishes conditions under which robustness is preserved.
Contribution
It extends the theory of qualitative robustness of bootstrap methods to dependent data processes, introducing new conditions for robustness beyond i.i.d. assumptions.
Findings
Qualitative robustness holds for certain dependent processes under specific conditions.
Continuity of the statistical operator is crucial for robustness.
A convergence condition called the Varadarajan property is necessary for robustness in dependent data.
Abstract
An important property of statistical estimators is qualitative robustness, that is small changes in the distribution of the data only result in small chances of the distribution of the estimator. Moreover, in practice, the distribution of the data is commonly unknown, therefore bootstrap approximations can be used to approximate the distribution of the estimator. Hence qualitative robustness of the statistical estimator under the bootstrap approximation is a desirable property. Currently most theoretical investigations on qualitative robustness assume independent and identically distributed pairs of random variables. However, in practice this assumption is not fulfilled. Therefore, we examine the qualitative robustness of bootstrap approximations for non-i.i.d. random variables, for example -mixing and weakly dependent processes. In the i.i.d. case qualitative robustness is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Risk and Portfolio Optimization · Fuzzy Systems and Optimization
Qualitative robustness for bootstrap approximations
Katharina Strohriegl,University of Bayreuth
(March 8, 2024)
Abstract
An important property of statistical estimators is qualitative robustness, that is small changes in the distribution of the data only result in small chances of the distribution of the estimator. Moreover, in practice, the distribution of the data is commonly unknown, therefore bootstrap approximations can be used to approximate the distribution of the estimator. Hence qualitative robustness of the statistical estimator under the bootstrap approximation is a desirable property. Currently most theoretical investigations on qualitative robustness assume independent and identically distributed pairs of random variables. However, in practice this assumption is not fulfilled. Therefore, we examine the qualitative robustness of bootstrap approximations for non-i.i.d. random variables, for example -mixing and weakly dependent processes. In the i.i.d. case qualitative robustness is ensured via the continuity of the statistical operator, representing the estimator, see Hampel (1971) and Cuevas and Romo (1993). We show, that qualitative robustness of the bootstrap approximation is still ensured under the assumption that the statistical operator is continuous and under an additional assumption on the stochastic process. In particular, we require a convergence condition of the empirical measure of the underlying process, the so called Varadarajan property.
Keywords: stochastic processes, qualitative robustness, bootstrap, -mixing, weakly dependent AMS: 60G20, 62G08, 62G09, 62G35
1 Introduction
The overwhelming part of theoretical publications in statistical machine learning was done under the assumption that the data is generated by independent and identically distributed (i.i.d.) random variables. However, this assumption is not fulfilled in many practical applications so that non-i.i.d. cases increasingly attract attention in machine learning. An important property of an estimator is robustness. It is well known that many classical estimators are not robust, which means that small changes in the distribution of the data generating process may highly affect the results, see for example Huber (1981), Hampel (1968), Jurečková and Picek (2006) or Maronna et al. (2006) for some books on robust statistics. Qualitative robustness is a continuity property of the estimator and means roughly speaking: small changes in the distribution of the data only lead to small changes in the distribution (i.e. the performance) of the estimator. In this way the following kinds of "small errors" are covered: small errors in all data points (rounding errors) and large errors in only a small fraction of the data points (gross errors, outliers). Qualitative robustness of estimators has been defined originally in Hampel (1968) and Hampel (1971) in the i.i.d. case and has been generalized to estimators for stochastic processes in various ways, for example, in Papantoni-Kazakos and Gray (1979), Bustos (1980), which will be the one used here, Cox (1981), Boente et al. (1987), Zähle (2015), and Zähle (2016), for a more local consideration of qualitative robustness, see for example Krätschmer et al. (2017).
Often the finite sample distribution of the estimator or of the stochastic process of interest is unknown, hence an approximation of the distribution is needed. Commonly, the bootstrap is used to receive an approximation of the unknown finite sample distribution by resampling from the given sample.
The classical bootstrap, also called the empirical bootstrap, has been introduced by Efron (1979) for i.i.d. random variables. This concept is based on drawing a bootstrap sample of size with replacement out of the original sample , , and approximate the theoretical distribution of using the bootstrap sample. For the empirical bootstrap the approximation of the distribution via the bootstrap is given by the empirical distribution of the bootstrap sample , hence , where denotes the dirac measure. The bootstrap sample itself has distribution .
For an introduction to the bootstrap see for example Efron and Tibshirani (1993) and van der Vaart (1998, Chapter 3.6). Besides the empirical bootstrap many other bootstrap methods have been developed in order to find good approximations also for non-i.i.d. observations, see for example Singh (1981), Lahiri (2003), and the references therein. In Section 2.2 the moving block bootstrap introduced by Künsch (1989) and Liu and Singh (1992) is used to approximate the distribution of an -mixing stochastic process.
It is, also in the non-i.i.d. case, still desirable that the estimator is qualitatively robust even for the bootstrap approximation. That is, the distribution of the estimator under the bootstrap approximation , , of the assumed, ideal distribution should still be close to the distribution of the estimator under the bootstrap approximation , , of the real contaminated distribution . Remember that this is a random object as respectively are random. For notational convenience all bootstrap values are noted as usual with an asterisk.
To show qualitative robustness often generalizations of Hampel’s theorem are used, as it is often hard to show qualitative robustness directly. For the i.i.d. case Hampel’s Theorem ensures qualitative robustness of a sequence of estimators, if these estimators are continuous and can be represented by a statistical operator which is continuous in the distribution of the data generating stochastic process. Accordingly we try to find results similar to Hampel’s theorem for the case of bootstrap approximations for non-i.i.d. cases.
Generalizations of Hampel’s theorem to non-i.i.d. cases can be found in Zähle (2015) and Zähle (2016). For a slightly different generalization of qualitative robustness, Hampel’s theorem has been formulated for strongly stationary and ergodic processes in Cox (1981) and Boente et al. (1982). In Strohriegl and Hable (2016) a generalization of Hampel’s Theorem to a broad class of non-i.i.d. stochastic processes is given. Cuevas and Romo (1993) describes a concept of qualitative robustness of bootstrap approximations for the i.i.d. case and for real valued estimators. Also a generalization of Hampel’s theorem to this case is given. In Christmann et al. (2013, 2011) qualitative robustness of Efron’s bootstrap approximation is shown for the i.i.d. case for a class of regularized kernel based learning methods, i. e. not necessarily real valued estimators. Moreover Beutner and Zähle (2016) describes consistency of the bootstrap for plug in estimators.
The next chapter contains a definition of qualitative robustness of the bootstrap approximation of an estimator and the main results. In Chapter 2.1 Theorem 2.2 shows qualitative robustness of the bootstrap approximation of an estimator for independent but not necessarily identically distributed random variables, Chapter 2.2 contains Theorem 2.6 and 2.7 which generalize the result in Christmann et al. (2013) to -mixing sequences with values in . All proofs are deferred to the appendix.
2 Qualitative robustness for bootstrap estimators
Throughout this paper, let be a Polish space with some metric and Borel--algebra . Denote by the set of all probability measures on . Let be the underlying statistical model. If nothing else is stated, we always use Borel--algebras for topological spaces. Let be the coordinate process on , that is , . Then the process has law under . Moreover let be the -th order marginal distribution of for every and . We are concerned with a sequence of estimators on the stochastic process . The estimator may take its values in any Polish space with some metric ; that is, for every .
Our work applies to estimators which can be represented by a statistical operator , that is,
[TABLE]
where denotes the empirical measure defined by , , for the observations . Examples of such estimators are M-estimators, R-estimators, see Huber (1981, Theorem 2.6), or Support Vector Machines, see Hable and Christmann (2011).
Based on the generalization of Hampel’s concept of -robustness from Bustos (1980), we define qualitative robustness for bootstrap approximations for non-i.i.d sequences of random variables. The stronger concept of -robustness is needed here, as we do not assume to have i.i.d. random variables, which are used in Cuevas and Romo (1993).
Therefore the definition of qualitative robustness stated below is stronger than the definition in Cuevas and Romo (1993), i. e. if we use this definition for the i.i.d. case the assumption implies , where denotes the bounded Lipschitz metric. This can be seen similar to the proof of Lemma 3.1 in Section 2.1.
Now, let be the approximation of with respect to the bootstrap. Define the bootstrap sample as the first coordinate projections , where the law of the stochastic process has to be chosen according to the bootstrap procedure. For the empirical bootstrap, for example, the bootstrap sample is chosen via drawing with replacement from the given observations , . Hence the distribution of the bootstrap sample is , with finite sample distributions .
Contrarily to the classical case of qualitative robustness the distribution of the estimator under , is a random probability measure, as the distribution , , is random. Hence the mapping , , is itself a random variable with values in , i. e. on the space of probability measures on , equipped with the weak topology on . The measurability of this mapping is ensured by Beutner and Zähle (2016, Lemma D1).
Contrarily to the original definitions of qualitative robustness in Bustos (1980) the bounded Lipschitz metric is used instead of the Prohorov metric for the definition of qualitative robustness of the bootstrap approximation below. This is equivalent to Cuevas and Romo (1993). Let be a separable metric space, then the bounded Lipschitz metric on the space of probability measures on is defined by:
[TABLE]
where denotes the bounded Lipschitz norm with and the supremum norm and the space of bounded Lipschitz functions is defined as . This is due to technical reasons only. Both metrics metricize the weak topology on the space of all probability measures , for Polish spaces , see, for example, Huber (1981, Chapter 2, Corollary 4.3) or Dudley (1989, Theorem 11.3.3), and therefore can be replaced while adapting on the left hand-side of implication (2). If is a Polish space, so is with respect to the weak topology, see Huber (1981, Chapter 2, Theorem 3.9). Hence the bounded Lipschitz metric on the right-hand side of implication (2) operates on a space of probability measures on the Polish space . Therefore the Prohorov metric and the bounded Lipschitz metric can again be replaced while adapting in (2). Similar to Cuevas and Romo (1993) the proof of the theorems below rely on the fact that the set of bounded Lipschitz functions BL is a uniform Glivenko-Cantelli class, which implies uniform convergence of the bounded Lipschitz metric of the empirical measure to a limiting distribution, see Dudley et al. (1991). Therefore the definition is given with respect to the bounded Lipschitz metric.
Definition 2.1** (Qualitative robustness for bootstrap approximations)**
Let and let be the bootstrap approximation of . Let with . Let , , be a sequence of estimators. Then the sequence of bootstrap approximations is called qualitatively robust at with respect to if, for every , there is such that there is such that for every and for every ,
[TABLE]
Here (respectively ) denotes the distribution of the bootstrap approximation of the estimator under (respectively ).
This definition of qualitative robustness with respect to the subset indicates that we do not show (2) for arbitrary probability measures . All of our results require the contaminated process to at least have the same structure as the ideal process. This is due to the use of the bootstrap procedure. The empirical bootstrap, which is used below, only works well for a few processes, see for example Lahiri (2003), hence the assumptions on the contaminated process are necessary. To our best knowledge there are no results concerning qualitative robustness of the bootstrap approximation for general stochastic processes without any assumptions on the second process and it is probably very hard to show this for every , respectively . Another difference to the classical definition of qualitative robustness in Bustos (1980) is the restriction to . As the results for the bootstrap are asymptotic results, we can not achieve the equicontinuity for every , but only asymptotically.
As the estimators can be represented by a statistical operator which depends on the empirical measure it is crucial to concern stochastic processes which at last provide convergence of their empirical measure. Therefore, Strohriegl and Hable (2016) proposed to choose Varadarajan process. Let be a probability space. Let , , be a stochastic process and . Then the stochastic process is called a (strong) Varadarajan process if there exists a probability measure such that
[TABLE]
The stochastic process is called weak Varadarajan process if
[TABLE]
Examples for Varadarajan processes are certain Markov Chains, some mixing processes, ergodic process and processes which satisfy a law of large numbers for events in the sense of Steinwart et al. (2009, Definition 2.1), see Strohriegl and Hable (2016) for details.
2.1 Qualitative robustness for independent not identically distributed processes
In this section we relax the i.i.d. assumption in view of the identical distribution. We assume the random variables , , to be independent, but not necessarily identically distributed.
The result below generalizes Christmann et al. (2013, Theorem 3) and Christmann et al. (2011), as the assumptions on the stochastic process are weaker as well as those on the statistical operator. Compared to Theorem 3 in Cuevas and Romo (1993), which shows qualitative robustness of the sequence of bootstrap estimators with values in , we have to strengthen the assumptions on the sample space, but do not need the estimator to be uniformly continuous. But keep in mind, that the assumption implies , which is used for the i.i.d. case, in Christmann et al. (2013) and Cuevas and Romo (1993).
Theorem 2.2
*Let the sequence of estimators be represented by a statistical operator via (1) for a Polish space and let be a totally bounded metric space.
Let , be an infinite product measure such that the coordinate process , , , is a strong Varadarajan process with limiting distribution . Moreover define . Let be continuous at with respect to and let the estimators , be continuous.
Then the sequence of bootstrap approximations , is qualitatively robust at with respect to . *
Remark 2.3
*The required properties on the statistical operator and on the sequence of estimators in Theorem 2.2 ensure the qualitative robustness of , as long as the assumptions on the underlying stochastic processes are fulfilled.
The proof shows that the bootstrap approximation of every sequence of estimators which is qualitatively robust in the sense of the definitions in Bustos (1980) and Strohriegl and Hable (2016, Definition 1) is qualitatively robust in the sense of Theorem 2.2.*
Hence Hampel’s theorem for the i.i.d. case can be generalized to bootstrap approximations and to the case of not necessarily identically distributed random variables if qualitative robustness is based on the definition of -robustness.
Unfortunately, the assumption on the space to be totally bounded seems to be necessary. In the proof of Theorem 2.2 we use a result of Dudley et al. (1991) to show uniformity on the space of probability measures . This result needs the bounded Lipschitz functions to be a uniform Glivenko-Cantelli class, which is equivalent to being totally bounded, see Dudley et al. (1991, Proposition 12). In order to weaken the assumption on , probably another way to show uniformity on the space of probability measures has to be found.
A short look on the metrics used on is advisable. We consider as the -fold product space of the Polish space . The product space is again a Polish space (in the product topology) and it is tempting to use a -product metric on , that is,
[TABLE]
where is a -norm on for . For example, is the Euclidean metric on and d_{n,\infty}\big{(}(z_{1},\dots,z_{n}),(z_{1}^{\prime},\dots,z_{n}^{\prime})\big{)}=\max_{i}d(z_{i},z_{i}^{\prime}); all these metrics are strongly equivalent. However, these common metrics do not cover the intuitive meaning of qualitative robustness as the distance between two points in (i.e., two data sets) is small only if all coordinates are close together (small rounding errors). So points where only a small fraction of the coordinates are far-off (gross errors) are excluded. Using these metrics, the qualitative robustness of the sample mean at every can be shown, see e.g. Strohriegl and Hable (2016, Proposition 1). But the sample mean is a highly non-robust estimator, as gross errors have great impact on the estimate. Following Boente et al. (1987), we use the metric on
[TABLE]
This metric on covers both kinds of "small errors". Though is not strongly equivalent to in general, it is topologically equivalent to the -product metrics , see Strohriegl and Hable (2016, Lemma 1). Hence, is metrizable also with metric . Moreover the continuity of on is with respect to the product topology on which can, due to the topological equivalence of these two metrics, be seen with respect to the common metrics .
The next part gives two examples of stochastic processes of independent, but not necessarily identically distributed random variables, which are Varadarajan processes. In particular these stochastic processes even satisfy a strong law of large numbers for events (SLLNE) in the sense of Steinwart et al. (2009) and therefore are, due to Strohriegl and Hable (2016, Theorem 2), strong Varadarajan processes. The first example is rather simple and describes a sequence of univariate normal distributions.
Example* 1 Let be a sequence with and let , for some constant for all . Let , be a stochastic process where , , are independent and . Then the process is a strong Varadarajan process.*
The second example are stochastic processes where the distributions of the random variables , , are lying in a so-called shrinking -neighbourhood of a probability measure .
Example* 2 Let be a measurable space and let be a stochastic process with independent random variables , , where*
[TABLE]
for a sequence , , and , . Then the process is a strong Varadarajan process.
The next corollary shows, that Support Vector Machines are qualitatively robust. For a detailed introduction to Support Vector Machines see e.g., Schölkopf and Smola (2002) and Steinwart and Christmann (2008). Let be a given dataset.
Corollary 2.4
*Let , closed, be a totally bounded, metric space and let be a stochastic process where the random variables , , are independent and , . Moreover let be a sequence of positive real valued numbers with , for some . Let be a reproducing kernel Hilbert space with continuous and bounded kernel and let be the SVM estimator, which maps to for a continuous and convex loss function . It is assumed that for every and that is additionally Lipschitz continuous in the last argument.
Then we have for every there is such that there is such that for all and for every process , where are independent and have distribution , :*
[TABLE]
That is, the sequence of bootstrap approximations is qualitatively robust if the second (contaminated) process is still of the same kind, i.e. still independent, as the original uncontaminated process .
2.2 Qualitative robustness for the moving block bootstrap of -mixing processes
Dropping the independence assumption we now focus on real valued mixing processes, in particular on strongly stationary -mixing or strong mixing stochastic processes. The mixing notion is an often used and well-accepted dependence notion which quantifies the degree of dependence of a stochastic process. There exist several types of mixing coefficients, but all of them are based on differences between probabilities . There is a large literature on this dependence structure. For a detailed overview on mixing, see Bradley (2005), Bradley (2007a, b, c), and Doukhan (1994) and the references therein. The -mixing structure has been introduced in Rosenblatt (1956). Also examples of relations between dependence structures and mixing coefficients can be found in the references above. Let be a set equipped with two -algebras and and a probability measure . Then the -mixing coefficient is defined by
[TABLE]
By definition the coefficients equal zero, if the -algebras are independent.
Moreover mixing can be defined for stochastic processes. We follow Steinwart et al. (2009, Definition 3.1):
Definition 2.5
Let be a stochastic process, , , and let be the -algebra generated by , . Then the -- and the -mixing coefficients are defined by
[TABLE]
A stochastic process is called - mixing with respect to if
[TABLE]
It is called weakly --mixing with respect to if
[TABLE]
Instead of Efron’s empirical bootstrap another bootstrap approach is used in order to represent the dependence structure of an -mixing process. Künsch (1989) and Liu and Singh (1992) introduced the moving block bootstrap (MBB). Often resampling of single observations can not preserve the dependence structure of the process, therefore they decided to take blocks of length of observations instead. The dependence structure of the process is preserved, within these blocks. The block length increases with the number of observations for asymptotic considerations. A slight modification of the original moving block bootstrap, see for example Politis and Romano (1990) and Shao and Yu (1993), is used in the next two theorems in order to avoid edge effects.
The proofs are based on central limit theorems for empirical processes. There are several results concerning the moving block bootstrap of the empirical process in case of mixing processes, see for example Bühlmann (1994), Naik-Nimbalkar and Rajarshi (1994), and Peligrad (1998, Theorem 2.2) for -mixing sequences and Radulović (1996) and Bühlmann (1995) for -mixing sequences. To our best knowledge there are so far no results concerning qualitative robustness for bootstrap approximations of estimators for -mixing stochastic processes. Therefore, Theorem 2.6 shows qualitative robustness for a stochastic process with values in . The proof is based on Peligrad (1998, Theorem 2.2), which provides a central limit theorem under assumptions on the process, which are weaker than those in Bühlmann (1994) and Naik-Nimbalkar and Rajarshi (1994). In the case of -valued, , stochastic processes, stronger assumptions on the stochastic process are needed, as the central limit theorem in Bühlmann (1994) requires stronger assumptions, see Theorem 2.7.
Let , , be the first projections of a real valued stochastic process and let , be the block length. Then, for fixed , the sample can be divided into blocks . If , we define , for the missing elements of the blocks. To get the MBB bootstrap sample , numbers from the set are randomly chosen with replacement. Without loss of generality it is assumed that , if is not a multiple of we simply cut the last block, which is usually done in literature. Then the sample consists of the blocks , that is .
As we are interested in estimators , , which can be represented by a statistical operator via , for a Polish space , see (1), the empirical measure of the bootstrap sample should approximate the empirical measure of the original sample . Contrarily to qualitative robustness in the case of independent and not necessarily identically distributed random variables (Theorem 2.2), the assumptions on the statistical operator are strengthened for the case of -mixing sequences. In particular the statistical operator is assumed to be uniformly continuous for all . For the first theorem we assume the random variables , to be real valued and bounded. Without loss of generality we assume , otherwise a transformation leads to this assumption. For the bootstrap for the true as well as for the contaminated process, we assume the block length and the number of blocks to be sequences of integers satisfying
[TABLE]
for , and , .
Theorem 2.6
Let be a probability measure on such that the coordinate process , is bounded, strongly stationary, and -mixing with
[TABLE]
*Let be the set of probability measures such that the coordinate process fulfils the properties above for the same . Let be a Polish space, with some metric , let be a sequence of estimators which can be represented by a statistical operator via (1). Moreover let be continuous and let be additionally uniformly continuous with respect to . Then the sequence of estimators is qualitatively robust at with respect to . *
The assumptions on the stochastic process are on the one hand, together with the assumptions on the block length, used to ensure the validity of the bootstrap approximation and on the other hand, together with the assumptions on the statistical operator, respectively the sequence of estimators, to ensure the qualitative robustness.
The next theorem generalizes this result to stochastic processes with values in , , instead of . Therefore, for example, the bootstrap version of the SVM estimator is qualitatively robust under weak conditions. The proof of the next theorem follows the same lines as the proof of the theorem above, but another central limit theorem, which is shown in Bühlmann (1994), is used. Therefore the assumptions on the mixing property of the stochastic process are stronger and the random variables are assumed to have continuous marginal distributions. Again the bootstrap sample results of a moving block bootstrap where blocks of length are chosen, again assuming . Moreover, let be a sequences of integers satisfying
[TABLE]
Theorem 2.7
Assume , . Let be a probability measure such that the coordinate process , is strongly stationary and -mixing with
[TABLE]
*Assume that has continuous marginal distributions for all . Define the set of probability measures such that the coordinate process is strongly stationary and -mixing as in (6).
Let be a Polish space, wit some metric , be a sequence of estimators such that is continuous and assume that can be represented by a statistical operator via (1) which is additionally uniformly continuous with respect to .*
*Then the sequence of estimators is qualitatively robust at with respect to . *
Although the assumptions on the statistical operator , compared to Theorem 2.2, were strengthened in order to generalize the qualitative robustness to -mixing sequences in Theorem 2.6 and 2.7, M-estimators are still an example for qualitative robust estimators if the sample space , is compact. The compactness of implies the compactness of the space , see Parthasarathy (1967, Theorem 6.4). As the statistical operator is continuous, the compactness of implies the uniform continuity of . Another example of M-estimators which are uniformly continuous even if the input space is not compact is given in Cuevas and Romo (1993, Theorem 4).
Acknowledgements: This research was partially supported by the DFG Grant 291/2-1 "Support Vector Machines bei stochastischer Unabhängigkeit". Moreover I would like to thank Andreas Christmann for helpful discussions on this topic.
3 Proofs
This section contains the proofs of the main theorems and corollaries.
3.1 Proofs of Section 2.1
Before proving Theorem 2.1, we state a rather technical lemma, connecting the product measure of independent random variables to their mixture measure . Let be a Polish space.
Lemma 3.1
Let such that and , . Then for all :
[TABLE]
Proof: Let be the set of bounded Lipschitz functions with .By assumption we have . Moreover for a function :
[TABLE]
Then,
[TABLE]
Now every function can be identified as a function , . This function is also Lipschitz continuous on
[TABLE]
where induces the product topology on . That is . Note that this is also true for every -product metric in , , as they are strongly equivalent. Hence,
[TABLE]
which yields the assertion.
Proof of Theorem 2.2: To prove Theorem 2.2 we first use the triangle inequality to split the bounded Lipschitz distance between the distribution of the estimator , , into two parts regarding the distribution of the estimator under the joint distribution of :
[TABLE]
Then the representation of the estimator by the statistical operator and the continuity of this operator in together with the Varadarajan property and the independence assumption on the stochastic process yield the assertion.
First we regard part I: Define the distribution and let be the bootstrap approximation of . Define, for , the random variables
, , and
, ,
such that and .
Denote the bootstrap sample by , , .
As Efron’s empirical bootstrap is used, the bootstrap sample, which is chosen via resampling with replacement out of , , has distribution , , respectively . The bootstrap approximation of , , is the empirical measure of the bootstrap sample .
Further denote the joint distribution of , , and by . Then, has marginal distributions for all , for all , and for all .
Then,
[TABLE]
and therefore
[TABLE]
By assumption the coordinate process consists of independent random variables, hence we have , for , .
Moreover is assumed to be a totally bounded metric space. Then, due to Dudley et al. (1991, Proposition 12), the set is a uniform Glivenko-Cantelli class. That is, if i.i.d. , we have for all :
[TABLE]
Applying this to the bootstrap sample , , which is found by resampling with replacement out of the original sample , we have, for all ,
[TABLE]
Let be arbitrary but fixed. Then, for every there is such that for all and all :
[TABLE]
And, using the same argumentation for the sequence of random variables , , which are i.i.d. and have distribution :
[TABLE]
Respectively, for every there is such that for all and all :
[TABLE]
As the process is a strong Varadarajan process by assumption, there exists a probability measure such that
[TABLE]
That is, for every there is such that for all :
[TABLE]
The continuity of the statistical operator in yields: for every there exists such that for all :
[TABLE]
As the Prohorov metric is bounded by the Ky Fan metric, see Dudley (1989, Theorem 11.3.5) we conclude:
[TABLE]
Due to the definition of the statistical operator , this is equivalent to
[TABLE]
The triangle inequality
[TABLE]
and the continuity of the statistical operator , see (11), then yield, for all ,
[TABLE]
Using the triangle inequality,
[TABLE]
gives for all :
[TABLE]
Hence, for all there are such that vor all , the infimum in (12) is bounded by . Therefore
[TABLE]
The equivalence between the Prohorov metric and the bounded Lipschitz metric for Polish spaces, see Huber (1981, Chapter 2, Corollary 4.3), yields the existence of such that for all
[TABLE]
To prove the convergence of the term in part II, consider the distribution and let be the bootstrap approximation of . Define, for , the random variables
, with distribution ,
, , with distribution , and
the bootstrap sample , , with distribution
Moreover let denote the joint distribution of , , , and . Then, has marginal distributions , , , and .
First, similar to the argumentation for part I, Efron’s bootstrap and Dudley et al. (1991, Proposition 12) give for :
[TABLE]
Hence, for arbitrary, but fixed , for every there is such that for all and all :
[TABLE]
Further,
[TABLE]
Respectively, for every there is such that for all and all :
[TABLE]
Moreover, as the random variables , , are independent, the bounded Lipschitz distance between the empirical measure and can be bounded, due to Dudley et al. (1991, Theorem 7). As totally bounded spaces are particularly separable, see Denkowski et al. (2003, below Corollary 1.4.28), Dudley et al. (1991, Proposition 12) provides that is a uniform Glivenko-Cantelli class. The proof of this proposition does not depend on the distributions of the random variables , and is therefore also valid for independent and not necessarily identically distributed random variables. Hence Dudley et al. (1991, Theorem 7) yields for all :
[TABLE]
as long as the assumptions of Proposition 12 in Dudley et al. (1991) apply. As is bounded, we have , see Dudley et al. (1991, page 499, before Proposition 10), hence it is sufficient to show that is image admissible Suslin. By assumption is totally bounded, hence is separable with respect to , see Strohriegl and Hable (2016, Lemma 3). As implies , the space is a bounded subset of , which is due to Dudley (1989, Theorem 2.4.9) a complete space. Now, is a closed subset of with respect to . Hence is complete, due to Denkowski et al. (2003, Proposition 1.4.17). Therefore is separable and complete with respect to and particularly a Suslin space, see Dudley (2014, p.229). As Lipschitz continuous functions are also equicontinuous, Dudley (2014, Theorem 5.28 (c)) gives that is image admissible Suslin.
Hence, Dudley et al. (1991, Theorem 7) yields
[TABLE]
and
[TABLE]
That is, there is such that for all
[TABLE]
Moreover, due to Lemma 3.1, we have
[TABLE]
Then the strong Varadarajan property of yields that there is such that for all
[TABLE]
Similar to the argumentation for part I we conclude, using again the boundedness of the Prohorov metric by the Ky Fan metric, see Dudley (1989, Theorem 11.3.5):
[TABLE]
Due to the definition of the statistical operator , this is equivalent to
[TABLE]
Moreover the triangle inequality yields
[TABLE]
Hence, for all , we obtain
[TABLE]
The continuity of the statistical operator in , see (11), gives
[TABLE]
Further, the triangle inequality yields
[TABLE]
Therefore we conclude, for all ,
[TABLE]
[TABLE]
Now, assume , then (20) yields , therefore this term can be omitted. Note that this is only proven for the -product metrics on and not for the metric from (4). For this metric we need a different argumentation, which is stated below the next calculation.
Hence, for all ,
[TABLE]
In order to show the above bound for the metric , see (4), on , we use another variant of the triangle inequality in (22):
[TABLE]
Assume . Then, the strong equivalence between the Prohorov metric and the bounded Lipschitz metric on Polish spaces, see Huber (1981, Chapter 2, Corollary 4.3), yields . Due to Dudley (1989, Theorem 11.6.2), implies the existence of a probability measure with marginal distributions and , such that . By a simple calculation implies and we have:
[TABLE]
Again the equivalence between the metrics and yields:
[TABLE]
Now we choose the joint distribution of , , , and such that the distribution of is . Then we conclude:
[TABLE]
Now, adapting the inequalities in (16), (17), and (21) in respectively yields the boundedness of the above term by for and for all .
Now we can go on with the proof similar for both kinds of metrics on .
The equivalence between the Prohorov metric and the bounded Lipschitz metric on Polish spaces, see Huber (1981, Chapter 2, Corollary 4.3), yields the existence of such that for all , (respectively ) implies
[TABLE]
Now, (15) and (24) yield for all :
[TABLE]
Recall that and are random quantities with values in . Hence (25) is equivalent to
[TABLE]
respectively
[TABLE]
Therefore, for all and for all :
[TABLE]
by a variant of Strassen’s Theorem, see Huber (1981, Chapter 2, Theorem 4.2, (2)(1)). That is,
[TABLE]
Hence for every we find and such that for all :
[TABLE]
which yields the assertion.
Proof of Example 1:
Without any restriction we assume . Otherwise regard the process , . By assumption, the random variables , , are independent. Hence , , are independent, see for example Hoffmann-Jørgensen (1994, Theorem 2.10.6) for all measurable , as is a measurable function. According to Steinwart et al. (2009, Proposition 2.8), satisfies the SLLNE if there is a probability measure in such that for all measurable . Hence:
[TABLE]
where denotes the density of the normal distribution with respect to the Lebesgue measure . Moreover define by
[TABLE]
Therefore , for all , is integrable and due to Lebesgue’s Theorem, see for example Hoffmann-Jørgensen (1994, Theorem 3.6):
[TABLE]
We have , where for all , as and therefore the Lemma of Kronecker, see for example Hoffmann-Jørgensen (1994, Theorem 4.9, Equation 4.9.1) yields: for all .
Now (26) yields the SLLNE:
[TABLE]
With Strohriegl and Hable (2016, Zheorem 2) the Varadarajan property is given.
Proof of Example 2:
Similar to the proof of Example 1, we first show the SLLNE, that is there exists a probability measure such that
[TABLE]
Now let be an arbitrary measurable set. Then:
[TABLE]
As, and , we have
[TABLE]
and similarly
[TABLE]
Hence (27) yields
[TABLE]
and therefore, due to Strohriegl and Hable (2016, Theorem 2), the assertion.
Proof of Corollary 2.4:
Due to Example 2, the stochastic process is a Varadarajan process. Hable and Christmann (2011, Theorem 3.2) ensures the continuity of the statistical operator for a fixed value . Moreover Hable and Christmann (2011, Corollary 3.4) yields the continuity of the estimator for every fixed . Hence for fixed the bootstrap approximation of the SVM estimator is qualitatively robust, for the given assumptions. Moreover the proof of Theorem 2.2, equation (25), and the equivalence between between bounded Lipschitz metric and Prokhorov distance yield: for every there is such that there is such that for all and if :
[TABLE]
Similarly to the proof of the qualitative robustness in Strohriegl and Hable (2016, Theorem 4) we get: for every there is , such that for all :
[TABLE]
And the same argumentation as in the proof of the qualitative robustness of the SVM estimator for the non-i.i.d. case in Strohriegl and Hable (2016, Theorem 4) for the cases and yields the assertion.
3.2 Proofs of Section 2.2
Proof of Theorem 2.6:
Proof of Theorem 2.6: Let be the bootstrap approximations of the true distribution and the contaminated distribution . First, the triangle inequality yields:
[TABLE]
First, we regard the term in part II. Let , be the -algebra generated by . Due to the assumptions on the mixing process , the sequence is a null sequence. Moreover it is bounded by the definition of the -mixing coefficient which, due to the strong stationarity, does not depend on . Therefore
[TABLE]
Hence, the process is weakly --mixing with respect to , see Definition 2.5. Due to the stationarity assumption, the process is additionally asymptotically mean stationary, that is for all for a probability measure . Therefore the process satisfies the WLLNE, see Steinwart et al. (2009, Proposition 3.2), and therefore is a weak Varadarajan process, see Strohriegl and Hable (2016, Theorem 2).
As the process is assumed to be a Varadarajan process and due to the assumptions on the sequence of estimators qualitative robustness of is ensured by Strohriegl and Hable (2016, Theorem 1). Together with the equivalence between the Prohorov metric and the bounded Lipschitz metric for Polish spaces, see Huber (1981, Chapter 2, Corollary 4.3), it follows:
For every there is such that for all and for all we have:
[TABLE]
This implies
[TABLE]
Hence the convergence of the term in part II is shown.
To prove the convergence of the term in part I, consider the distribution and let be the bootstrap approximation of , via the blockwise bootstrap. Define, for , the random variables
, , and
, ,
such that and .
Moreover denote the bootstrap sample by , , , and the distribution of by . The blockwise bootstrap approximation of , , is , . Note that the sample depends and on the blocklength and on the number of blocks .
Further denote the joint distribution of , , and by . Then, has marginal distributions for all , for all , and for all .
Then,
[TABLE]
and therefore
[TABLE]
By assumption we have , . Hence , i. e. , which is a totally bounded metric space. Therefore the set is a uniform Glivenko-Cantelli class, due to Dudley et al. (1991, Proposition 12). Similar to part I of the proof of Theorem 2.2, the blockwise bootstrap structure and the Glivenko-Cantelli property yield:
[TABLE]
Respectively, for fixed , for every there is such that for all and all :
[TABLE]
Regard the process , . Due to the assumptions on the process and on the moving block bootstrap, Theorem 2.3 in Peligrad (1998) yields the almost sure convergence in distribution to a Brownian bridge :
[TABLE]
almost surely with respect to , , in the Skorohod topology on . Here indicates convergence in distribution and denotes the space of cadlag functions on , for details see for example Billingsley (1999, p. 121).
This is equivalent to
[TABLE]
for all continuity points of , see Billingsley (1999, (12.14), p. 124).
Multiplying by yields for any fixed continuity point
[TABLE]
As convergence in distribution to a finite constant implies convergence in probability, see for example van der Vaart (1998, Theorem 2.7(iii)), and as in probability, for all :
[TABLE]
for all continuity points of , where denotes the convergence in probability.
Hence, Dudley (1989, Theorem 11.12) yields the convergence of the corresponding probability measures:
[TABLE]
Respectively
[TABLE]
Define the set . Hence,
[TABLE]
and, for all , there is such that for all :
[TABLE]
By assumption we have , . Hence the space of probability measures is a subset of and therefore tight, as [0,1] is a compact space, see e. g. (Klenke, 2013, Example 13.28). Then Prohorov’s Theorem, see for example Billingsley (1999, Theorem 5.1) yields relative compactness of and in particular the relative compactness of the set . As is a complete space, see Dudley (1989, Theorem 11.5.5), relative compactness equals total boundedness. That is, there exists a finite dense subset of such that for all and there is such that
[TABLE]
The triangle inequality yields:
[TABLE]
Define . Then (32) yields for every the existence of an integer such that, for all and all :
[TABLE]
Hence, for all and for all , we have:
[TABLE]
Due to the uniform continuity of the operator , for every there is such that for all :
[TABLE]
Moreover, the triangle inequality yields:
[TABLE]
Again we use the relation between the Prohorov metric and the Ky Fan metric, Dudley (1989, Theorem 11.3.5):
[TABLE]
Due to the definition of the statistical operator , this is equivalent to
[TABLE]
Due to the uniform continuity of , see (35), we obtain, for all
[TABLE]
The triangle inequality, (36), then yields for all :
[TABLE]
The equivalence between the Prohorov metric and the bounded Lipschitz metric on Polish spaces, see Huber (1981, Chapter 2, Corollary 4.3), yields the existence of such that for every
[TABLE]
And therefore
[TABLE]
For the convergence of the term in part III the same argumentation as for part I can be applied, as the assumptions on and are the same as for and . In particular for every there is such that for all :
[TABLE]
respectively
[TABLE]
Hence, (28), (37), and (38) yield, for all :
[TABLE]
As and are random variables itself we have, due to Huber (1981, Chapter 2 Theorem 4.2, (2)(1)), for all :
[TABLE]
Hence, for all there is such that there is such that, for all :
[TABLE]
and therefore the assertion.
Proof of Theorem 2.7:
Proof of Theorem 2.7: The proof follows the same lines as the proof of Theorem 2.6 and therefore we only state the different steps. Again we start with the triangle inequality:
[TABLE]
To proof the convergence of the term in part II, we need the weak Varadarajan property of the stochastic process. Due to the definition for all , , and obviously:
[TABLE]
Hence, due to the strong stationarity of the stochastic process, we have:
[TABLE]
Now, the same argumentation as in the proof of Theorem 2.6 yields the weak Varadarajan property and therefore, for all ,
[TABLE]
Regarding the term in part I, we use a central limit theorem for the blockwise bootstrapped empirical process by Bühlmann (1994, Corollary 1 and remark) to show its convergence. Again, regard the distribution and let be the bootstrap approximation of , via the blockwise bootstrap. Define, for all , the random variables
, , and
, ,
such that and .
Moreover denote the bootstrap sample by , , , and the distribution of by . The bootstrap approximation of is , , by definition of the bootstrap procedure. Note that the sample depends and on the blocklength and on the number of blocks .
Further denote the joint distribution of , , and by . Then, has marginal distributions for all , for all , and for all .
Then,
[TABLE]
and therefore
[TABLE]
As is compact, it is in particular totally bounded. Hence the set is a uniform Glivenko-Cantelli class, due to Dudley et al. (1991, Proposition 12). Similar to part I of the proof of Theorem 2.6, the bootstrap structure and the Glivenko-Cantelli property given above yield for arbitrary, but fixed :
for every there is such that, for all and all ,
[TABLE]
Now, regard the empirical process of . Set . Moreover means for all . Hence we can define the empirical process and the blockwise bootstrapped empirical process by
[TABLE]
Regard the process , . Now, due to the assumptions on the stochastic process and on the moving block bootstrap, Bühlmann (1994, Corollary 1 and remark) yields the almost sure convergence in distribution to a Gaussian process :
[TABLE]
almost surely with respect to , , in the (extended) Skorohod topology on .
The space is a generalization of the space of cadlag functions on , see Billingsley (1999, Chapter 12), and consists of functions . A detailed description of this space and the extended Skorohod topology can be found in Straf (1972, 1969) and Bickel and Wichura (1971). The definition of the space can, for example, be found in Bickel and Wichura (1971, Chapter 3).
Straf (1972, Lemma 5.4) yields, that the above convergence in the Skorohod topology is equivalent to the convergence for all continuity points of . Hence,
[TABLE]
for all continuity points of .
Multiplying by yields, for every continuity point of ,
[TABLE]
As convergence in distribution to a constant implies convergence in probability, see e. g. van der Vaart (1998, Theorem 2.7(iii)) and as converges in probability to [math], for all fixed continuity points of :
[TABLE]
This yields the convergence of the corresponding probability measures, see for example Billingsley (1995, Chapter 29) for a theory on :
[TABLE]
respectively
[TABLE]
As the space is compact, we can use an argumentation similar to the proof of Theorem 2.6. Then, for every , there is such that for all
[TABLE]
respectively,
[TABLE]
The convergence of the term in part III follows simultaneously to part I for the distributions and . Hence, for every , there is such that for all
[TABLE]
The combination of (40), (41), and (42) yields for all :
[TABLE]
As and are random variables itself we have, due to Huber (1981, Chapter 2, Theorem 4.2, (2)(1)), for all
[TABLE]
Hence, for all there is such that there is such that for all
[TABLE]
This yields the assertion.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Beutner and Zähle (2016) E. Beutner and H. Zähle. Functional delta-method for the bootstrap of quasi-Hadamard differentiable functionals. Electron. J. Stat. , 10, 2016.
- 2Bickel and Wichura (1971) P. J. Bickel and M. J. Wichura. Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Statist. , 42:1656–1670, 1971.
- 3Billingsley (1995) P. Billingsley. Probability and measure . Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, third edition, 1995.
- 4Billingsley (1999) P. Billingsley. Convergence of probability measures . Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, second edition, 1999.
- 5Boente et al. (1982) G. Boente, R. Fraiman, and V. J. Yohai. Qualitative robustness for general stochastic processes. Technical report, Department of Statistics, University of Washington, 1982.
- 6Boente et al. (1987) G. Boente, R. Fraiman, and V. J. Yohai. Qualitative robustness for stochastic processes. The Annals of Statistics , 15(3):1293–1312, 1987.
- 7Bradley (2005) R. C. Bradley. Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. , 2:107–144, 2005.
- 8Bradley (2007 a) R. C. Bradley. Introduction to strong mixing conditions. Vol. 1 . Kendrick Press, Heber City, UT, 2007 a.
