Inference on Functionals under First Order Degeneracy
Qihui Chen, Zheng Fang

TL;DR
This paper develops a second order asymptotic framework for inference on functionals with null first order derivatives, identifying bootstrap limitations and proposing corrections for degenerate and nondifferentiable cases.
Contribution
It introduces a unified second order inference framework, analyzes bootstrap failures under degeneracy, and proposes correction methods for reliable inference in such settings.
Findings
Standard bootstrap is inconsistent when second order derivative is nonzero.
The correction procedure from Babu (1984) can be extended to this setting.
Modified bootstrap methods achieve local size control under certain conditions.
Abstract
This paper presents a unified second order asymptotic framework for conducting inference on parameters of the form , where is unknown but can be estimated by , and is a known map that admits null first order derivative at . For a large number of examples in the literature, the second order Delta method reveals a nondegenerate weak limit for the plug-in estimator . We show, however, that the `standard' bootstrap is consistent if and only if the second order derivative under regularity conditions, i.e., the standard bootstrap is inconsistent if , and provides degenerate limits unhelpful for inference otherwise. We thus identify a source of bootstrap failures distinct from that in Fang and Santos (2018) because the problem (of consistently bootstrapping a…
| First Order Degeneracy (i.e. ) | |||
| Yes | No | ||
| Nondifferentiability (1st or 2nd order) | Yes | This paper | Fang_Santos2014HDD |
| No | This paper | Standard | |
| Design | # of Assets | # of Factors | GARCH Parameters | Factor Loadings |
| D1 | ||||
| D2 | ||||
| D3 | ||||
| D4 | ||||
| D5 | ||||
| CF1 | CF2 | DG | DR | |||||||||
| DG1 | DG2 | M-DG1 | M-DG2 | DR | M-DR | |||||||
| 0.0850 | 0.0640 | 0.0395 | 0.0100 | 0.0140 | 0.0160 | 0.1740 | 0.0075 | |||||
| 0.0940 | 0.0715 | 0.0550 | 0.0120 | 0.0290 | 0.0315 | 0.2855 | 0.0125 | |||||
| 0.1010 | 0.0740 | 0.0505 | 0.0075 | 0.0485 | 0.0510 | 0.3805 | 0.0185 | |||||
| 0.1010 | 0.0820 | 0.0550 | 0.0090 | 0.0480 | 0.0545 | 0.4005 | 0.0240 | |||||
| 0.1005 | 0.0725 | 0.0495 | 0.0115 | 0.0425 | 0.0550 | 0.4405 | 0.0225 | |||||
| 0.1180 | 0.0900 | 0.0700 | 0.0165 | 0.0635 | 0.0625 | 0.4710 | 0.0400 | |||||
| 0.1070 | 0.0830 | 0.0665 | 0.0145 | 0.0425 | 0.0515 | 0.4430 | 0.0335 | |||||
| CF1 | CF2 | DG | DR | |||||||||
| DG1 | DG2 | M-DG1 | M-DG2 | DR | M-DR | |||||||
| 0.0605 | 0.0390 | 0.0660 | 0.0430 | 0.0025 | 0.0030 | 0.0305 | 0.0000 | |||||
| 0.0645 | 0.0385 | 0.0655 | 0.0380 | 0.0040 | 0.0040 | 0.0565 | 0.0005 | |||||
| 0.0520 | 0.0385 | 0.0505 | 0.0275 | 0.0025 | 0.0015 | 0.0715 | 0.0000 | |||||
| 0.0690 | 0.0565 | 0.0830 | 0.0320 | 0.0030 | 0.0040 | 0.0960 | 0.0000 | |||||
| 0.0660 | 0.0600 | 0.0850 | 0.0335 | 0.0070 | 0.0065 | 0.1145 | 0.0005 | |||||
| 0.0520 | 0.0460 | 0.0645 | 0.0225 | 0.0025 | 0.0040 | 0.1175 | 0.0000 | |||||
| 0.0745 | 0.0670 | 0.0920 | 0.0395 | 0.0065 | 0.0040 | 0.1540 | 0.0005 | |||||
| CF1 | CF2 | DG | DR | |||||||||
| DG1 | DG2 | M-DG1 | M-DG2 | DR | M-DR | |||||||
| 0.0715 | 0.0445 | 0.1305 | 0.0415 | 0.0240 | 0.0240 | 0.1795 | 0.0010 | |||||
| 0.0895 | 0.0515 | 0.1485 | 0.0330 | 0.0345 | 0.0335 | 0.3210 | 0.0055 | |||||
| 0.1055 | 0.0720 | 0.1590 | 0.0300 | 0.0400 | 0.0400 | 0.4625 | 0.0075 | |||||
| 0.1135 | 0.0615 | 0.1440 | 0.0290 | 0.0445 | 0.0370 | 0.4840 | 0.0080 | |||||
| 0.1155 | 0.0715 | 0.1530 | 0.0290 | 0.0565 | 0.0460 | 0.5555 | 0.0170 | |||||
| 0.1280 | 0.0810 | 0.1655 | 0.0300 | 0.0635 | 0.0700 | 0.5650 | 0.0145 | |||||
| 0.1150 | 0.0775 | 0.1650 | 0.0260 | 0.0535 | 0.0685 | 0.5980 | 0.0125 | |||||
| CF1 | CF2 | DG | DR | ||||||
| M-DG1 | M-DG2 | M-DR | |||||||
| 0.6450 | 0.5915 | 0.7255 | 0.5570 | 0.2420 | 0.2170 | 0.3740 | |||
| 0.9410 | 0.9185 | 0.9530 | 0.8785 | 0.4935 | 0.3945 | 0.8325 | |||
| 0.9975 | 0.9975 | 0.9995 | 0.9950 | 0.9070 | 0.9180 | 0.9940 | |||
| 0.9980 | 0.9980 | 0.9985 | 0.9985 | 0.9995 | 0.9995 | 0.9985 | |||
| 0.9985 | 0.9990 | 0.9995 | 0.9985 | 1.0000 | 1.0000 | 0.9985 | |||
| 0.9995 | 0.9995 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.9950 | |||
| 0.9995 | 0.9995 | 0.9995 | 0.9995 | 1.0000 | 1.0000 | 0.9995 | |||
| CF1 | CF2 | DG | DR | ||||||
| M-DG1 | M-DG2 | M-DR | |||||||
| 0.1240 | 0.0740 | 0.3990 | 0.3000 | 0.0385 | 0.0395 | 0.0140 | |||
| 0.3520 | 0.2710 | 0.6975 | 0.5570 | 0.1065 | 0.0870 | 0.1295 | |||
| 0.8250 | 0.7710 | 0.9610 | 0.8885 | 0.3470 | 0.3365 | 0.6675 | |||
| 0.9865 | 0.9850 | 0.9995 | 0.9955 | 0.5945 | 0.6765 | 0.9420 | |||
| 0.9980 | 0.9970 | 1.0000 | 1.0000 | 0.6385 | 0.6005 | 0.9665 | |||
| 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.7225 | 0.7135 | 0.9710 | |||
| 0.9995 | 0.9995 | 1.0000 | 1.0000 | 0.7755 | 0.7445 | 0.9765 | |||
| for some constant that is universal in the proof. | |
| For in a metric space , . | |
| The space of real matrices. | |
| For a set , . | |
| For a set , . | |
| For a set , is the set of continuously differentiable functions on . | |
| For sets , is the Hausdorff distance between and . |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\EdefEscapeHex
title.1title.1\EdefEscapeHexTitleTitle\[email protected]\hyper@anchorend
Inference on Functionals under First Order Degeneracy
Qihui Chen
School of Management and Economics
The Chinese University of Hong Kong, Shenzhen
Zheng Fang
Department of Economics
Texas A&M University
[email protected] We would like to thank Brendan Beare, Andres Santos, Yixiao Sun and anonymous referees for valuable suggestions that have helped greatly improve this paper. We are also grateful for Xiaohong Chen, Qi Li and seminar participants for helpful discussions and comments.
Abstract
This paper presents a unified second order asymptotic framework for conducting inference on parameters of the form , where is unknown but can be estimated by , and is a known map that admits null first order derivative at . For a large number of examples in the literature, the second order Delta method reveals a nondegenerate weak limit for the plug-in estimator . We show, however, that the “standard” bootstrap is consistent if and only if the second order derivative under regularity conditions, i.e., the standard bootstrap is inconsistent if , and provides degenerate limits unhelpful for inference otherwise. We thus identify a source of bootstrap failures distinct from that in Fang_Santos2014HDD because the problem (of consistently bootstrapping a nondegenerate limit) persists even if is differentiable. We show that the correction procedure in Babu1984bootstrap can be extended to our general setup. Alternatively, a modified bootstrap is proposed when the map is in addition second order nondifferentiable. Both are shown to provide local size control under some conditions. As an illustration, we develop a test of common conditional heteroskedastic (CH) features, a setting with both degeneracy and nondifferentiability – the latter is because the Jacobian matrix is degenerate at zero and we allow the existence of multiple common CH features.
Keywords: First order degeneracy, Second order Delta method, Bootstrap consistency, Babu correction, Common CH features, -test.
JEL Classification: C12, C15
1 Introduction
There is a large number of inference problems in economics and statistics in which the parameter of interest is of the form , where is an unknown parameter depending on the underlying distribution of the data and is a known map. In these settings, it is common practice to employ the plug-in estimator , where is an estimator for , as a building block for conducting inference on . The Delta method asserts that if for some sequence , then
[TABLE]
provided is at least Hadamard directionally differentiable at , where is the derivative of at (Shapiro1991; Dumbgen1993). As powerful as the Delta method has proven to be (Vaart1998; Fang_Santos2014HDD), an implicit and yet crucial assumption for the convergence (1) to be useful for inferential purposes is that or is nondegenerate, i.e., . Unfortunately, such first order degeneracy arises frequently in asymptotic analysis, with applications including Wald tests or Wald type functionals (Wald1943tests; Engle1984Handbook), unconditional and conditional moment inequality models (AndrewsandSoares2010; Andrews_Shi2013CMI), Cramér-von Mises functionals (Darling1957KSCvM), the study of stochastic dominance (Linton2010), and the -test for overidentification in GMM settings (Hall_Horowitz1996bootstrap).
In the presence of first order degeneracy, one may resort to a higher order analysis for the sake of a nondegenerate limiting distribution. Shapiro2000inference established that if is second order Hadamard directionally differentiable (see Definition 2.2) – a feature shared by aforementioned examples, then
[TABLE]
where denotes the second order derivative of at . Thus, when first order degeneracy occurs, (2) suggests that we may base our asymptotic analysis on
[TABLE]
Usefulness of the limiting distribution in (3), however, relies on our ability to consistently estimate it. In this regard, Efron1979’s bootstrap seems to be a potential option. Specifically, if is a bootstrap analog of that works for estimating the law of , then in view of (3) one may hope that
[TABLE]
can be employed as an estimator for the law of , at least when is smooth. Unfortunately, there are simple examples where the law of (4) conditional on the data, referred to as the standard bootstrap, fails to provide consistent estimates (Babu1984bootstrap).
As the first contribution of this paper, we show that the standard bootstrap (4) is consistent if and only if under mild conditions. Thus, the standard bootstrap is necessarily inconsistent when is nondegenerate, while when is degenerate, the resulting asymptotic distribution is degenerate and hence not useful for inference. Therefore, the failure of the standard bootstrap is an inherent implication of first order degeneracy. It is worth noting that the failure of the standard bootstrap persists even when is differentiable. Hence, we identify a source of bootstrap inconsistency distinct from that in Fang_Santos2014HDD, i.e., nondifferentiability of the map , as explained further towards the end of this section.
Heuristically, the reason why the standard bootstrap fails is that even though in the “real world”, its bootstrap counterpart is nondegenerate, i.e., , echoing Efron1979’s point that the bootstrap provides approximate frequency statements rather than approximate likelihood statements. This observation was picked up by Babu1984bootstrap who provided a consistent resampling procedure by including the first order correction term:
[TABLE]
As the second contribution, we generalize the above modified bootstrap (5), referred to as the Babu correction, to settings that accommodate infinite dimensional models and a wide range of bootstrap schemes for . However, we stress that the Babu correction is inappropriate when is only Hadamard directionally differentiable.
As the third contribution, we follow Fang_Santos2014HDD and provide a modified bootstrap which is consistent regardless of the presence of first order degeneracy and nondifferentiability of . The insight we exploit is that the weak limit in (3) is a composition of the limit and the derivative . Therefore, we may estimate the law of by composing a suitable estimator for with a bootstrap approximation for . Since the conditions on proposed by Fang_Santos2014HDD in order for this approach to work are either demanding or hard to check in our setup, we provide a high level condition that is easy to verify. We further demonstrate that numerical differentiation provides a desirable estimator in general; alternatively, we show how to estimate by exploiting its structure in particular examples. Our inference procedures are also shown to enjoy the local size control property under a key condition that is algebraically simple.
Finally, to further demonstrate the applicability of our framework, we develop a test of common conditional heteroskedastic (CH) features studied by Dovonon_Renault2013testing but under weaker assumptions that allow more than one common CH features. Thus, in addition to the first order identification failure they focused on, we further allow second order (and hence global) identification failures, which renders the functional involved highly (second-order) nondifferentiable as well as first order degenerate. Such a generalization is important because it is unknown a priori how many common features there are and in the context of asset pricing the number can be large (Engle_Ng_Rothschild1990asset). Moreover, the linear normalization in Dovonon_Renault2013testing can falsely exclude the existence of common features even when there does exist a unique common CH feature, a deficiency which we avoid by the unit-length normalization. Monte Carlo simulations indicate our tests substantially alleviate size distortion and have good power performance. We stress that first order degeneracy is of a nature different from that of the degeneracy of Jacobian matrices which is the focus of Dovonon_Renault2013testing; see Section 4 for details. Our approach may also be used to develop tests for other common features (Engle_Kozicki1993CF).
There have been extensive studies on the bootstrap consistency (Hall1992bootstrap; HorowitzBoot). It was realized soon after Efron1979 that the bootstrap is not always successful (BickelandFreedman1981bootstrap); see also Andrews2000Bootstrap for a summary. Babu1984bootstrap provided a simple example of bootstrap failure due to first order degeneracy, and established the validity of the Babu correction for the special case studied there. Shao1994bootstrap and Bertail_Politis_Romano1999subsampling showed that out of resampling and subsampling can serve as alternative remedies. There are, however, three reasons we choose not to use these methods. First, they entail the choice of tuning parameters while our proposal can work without such nuisances when is differentiable. Second, when is nondifferentiable, both can lead to invalid tests due to lack of uniform approximations (AndrewsandGuggen2010ET). We provide a simple algebraic condition which, together with regularity of , delivers local uniformity of our inferential procedure. Third, they have been shown to be dominated by other inferential methods, for example, in moment inequality models (AndrewsandSoares2010) which our framework includes as special cases. Datta1995bootstrap revisited Babu’s example and offered a bias correction procedure that depends on a first stage shrinkage type estimator. Somewhat similar methods were later proposed in Andrews2000Bootstrap and Giurcanu2012bootsrtap. These methods are not easily extendable to more general settings.
Bootstrap inconsistency due to nondifferentiability of was studied in Dumbgen1993 and recently in Fang_Santos2014HDD who formally established that (first order) differentiability of is a necessary as well as sufficient condition for the standard bootstrap to work under regularity conditions. Our work complements theirs by identifying a different source of bootstrap failure. Specifically, given bootstrap consistency of and if is first order degenerate (and hence fully differentiable!), then Fang_Santos2014HDD implies that the standard bootstrap is consistent for the law of which is degenerate (and unhelpful for inference). We further show that the law of the second order limit cannot be consistently estimated by the second order standard bootstrap (4) unless itself is degenerate – this remains true regardless of whether is (second order) differentiable or not! Moreover, extra work is needed in order to show our bootstrap inferential procedures work well in the local uniformity sense. In applications, first order degeneracy and second order nondifferentiability are often mixed together, for example, in Romano_Shaikh2010, AndrewsandSoares2010, Linton2010, and Andrews_Shi2013CMI. The numerical differentiation approach of estimating derivatives was somewhat implicit in Dumbgen1993’s rescaled bootstrap, recently employed by Song2014minimax and studied by Hong_Li2015numericaldelta. We provide a more general condition that may be used to verify “consistency” of derivative estimators (not necessarily constructed via numerical differentiation). Our theory has been utilized in ChenFang2016Rank to develop a rank test where, unlike previous studies, the true rank is potentially strictly less than the hypothesized value, a longstanding problem in the literature.
We now introduce some notation. For a set , we let denote the space of bounded real-valued functions defined on and the space of real-valued continuous functions on a compact set (endowed with some topology). Both and are equipped with the uniform norm, i.e., . For a normed space endowed with norm and , we equip the product space with the product norm , denoted with some abuse of notation, for , where and are the th coordinates of and respectively. For a subset , we write for the indicator function of .
The remainder of the paper is structured as follows. Section 2 formalizes the general setup, shows the wide applicability of our framework by introducing related examples, and establishes the asymptotic framework by presenting a mild extension of the second order Delta method. Section 3 characterizes the inherent difficulties caused by first order degeneracy, extends the Babu correction to our general setup, and offers a flexible modified bootstrap procedure. Section 4 develops a test for common CH features that allows multiple common CH features, while Section 5 concludes. Appendix A demonstrates that our inferential procedure is robust to local perturbations of the distribution of the data under regularity conditions. The remaining appendices collect all the proofs and additional discussions.
2 Setup and Background
In this section, we formalize the general setup, introduce related examples, and review notions of differentiability based on which we present the second order Delta method.
2.1 General Setup
The treatment in this paper is general in the sense that we allow both the parameter and the map to take values in infinite dimensional spaces, though attention is confined to real-valued when studying tests. In particular, we assume and , where and are normed spaces with norms and respectively. Moreover, the data generating process is general as well in that the model can be parametric, semiparametric and nonparametric and that the data need not be i.i.d.. However, we do impose i.i.d. assumption in our local analysis, but only for simplicity. The results there can presumably be extended to general asymptotically normal experiments (Vaart_Wellner1990prohorov).
The common probability space on which all (random) maps are defined is the canonical one. For example, in the simplest i.i.d. setup, we think of the data as the coordinate projections on the first coordinates in the product probability space where is the sample space each lives in and is the common Borel probability measure that governs each . In the presence of bootstrap weights, we further think of the product space as the “first ” coordinates of the even “larger” product space \big{(}(\prod_{i=1}^{\infty}\mathscr{X})\times\mathscr{W},(\bigotimes_{i=1}^{\infty}\mathcal{A})\otimes\mathcal{W},(\prod_{i=1}^{\infty}P)\times Q\big{)}, where governs the infinite sequence of bootstrap weights.
Given the generality of our setup, weak convergence throughout the paper is meant in the Hoffmann-Jørgensen sense (Vaart1996). Expectations and probabilities should therefore be interpreted as outer expectations and outer probabilities respectively defined relative to the canonical probability space, though we obviate the distinction in the notation. The notation is made explicit in the appendices whenever differentiating between inner and outer expectations is necessary.
2.2 Related Examples
To fix ideas, we now turn to related examples that serve to illustrate the wide applicability of our framework. The first example is taken from Babu1984bootstrap, which provides an easy illustration of bootstrap inconsistency in the presence of first order degeneracy even if the transformation is smooth.
Example 2.1** (Wald Functional: Squared Mean).**
Let be a random variable, and suppose that we are interested in conducting inference on
[TABLE]
Here, , , and is defined by . In fact, is a special case of the more general quadratic functionals of the form for and a weighting matrix. This seemingly toy example also arises in VAR models for inference on impulse responses (Benkwitz_Neumann_Lutekpohl2000) and in some nonseparable models with structural measurement errors (Hoderlein_Winter2010). ∎
The second example is a special case of the unconditional moment inequality models studied in CHT2007, Romano_Shaikh2008; Romano_Shaikh2010, AndrewsandGuggen2009ET, and AndrewsandSoares2010.
Example 2.2** (Unconditional Moment Inequalities).**
Let be a scalar random variable and suppose we want to test the moment inequality . The modified method of moments approach is based on estimating the functional
[TABLE]
where , , and is defined by . The functional can be easily adapted to handle general moment inequality models.∎
The third example concerns the classical Cramér-von Mises functional employed to test goodness of fit (Darling1957KSCvM; Vaart1998).
Example 2.3** (Cramér-von Mises Functional).**
Suppose that we are interested in testing if the distribution function of a random vector is a given function . The Cramér-von Mises approach considers the functional
[TABLE]
Here, , , , and is defined to be . More generally, it is possible to test if belongs to a parametric family by studying . ∎
The fourth example, closely related to but significantly different from Example 2.3, is based on Linton2010 for testing stochastic dominance.
Example 2.4** (Stochastic Dominance).**
Let be continuously distributed, and define the marginal cdfs for . For a weighting function , Linton2010 estimate
[TABLE]
to construct a test of whether first order stochastically dominates . In this example, we set , , and for any . We note that the Cramér-von Mises type functionals in Andrews_Shi2013CMI; Andrews_Shi2014CMI shares the common structure of the functional in (8) and hence can be taken care of by our framework as well.∎
The fifth example is a special case of the Kolmogorov-Smirnov type functionals for inference on conditional moment inequalities studied by Andrews_Shi2013CMI.
Example 2.5** (Conditional Moment Inequalities).**
Let and be random vectors satisfying and . For a suitably chosen class of nonnegative functions on , the above conditional moment inequality is equivalent to and for all . Andrews_Shi2013CMI propose testing the above restriction by estimating the functional
[TABLE]
Here, satisfies for all , , , and is given by . ∎
Our final example is concerned with the -test of overidentification in GMM settings proposed by Sargan1958iv; Sargan1959IV and further developed in Hansen1982.
Example 2.6** (Overidentification Test).**
Let be a random vector and consider the model defined by the moment restriction for some where is a known function with . The conventional -test can be recast by estimating the functional defined as: for some known symmetric positive definite matrix ,
[TABLE]
Here, is defined by , , , and is defined by . The bootstrap for the statistic has been studied by Hall_Horowitz1996bootstrap and Andrews2002higher. Note that is always identified even though is potentially partially identified, which makes second order nondifferentiable as will be shown below. ∎
2.3 Concepts of Differentiability
All examples in the previous subsection exhibit first order degeneracy, i.e., there exist points in such that the first order derivative is [math] and in some cases is not even differentiable at , which can be seen from Examples 2.1 and 2.2 respectively. As such, we resort to a second order expansion that handles first order degeneracy and meanwhile accommodates potential nondifferentiability of . Let us proceed by recalling notions of first order differentiability (Shapiro1990; Fang_Santos2014HDD)
Definition 2.1**.**
Let and be normed spaces equipped with norms and respectively, and .
- (i)
The map is said to be Hadamard differentiable at tangentially to a set , if there is a continuous linear map such that:
[TABLE]
for all sequences and such that , as and for all .
- (ii)
The map is said to be Hadamard directionally differentiable at tangentially to a set , if there is a continuous map such that:111We note that the “tangential set” in Shapiro1991 refers to the domain of (i.e., in our context), whereas here it refers to the domain of the derivative .
[TABLE]
for all sequences and such that , as and for all .
Inspecting Definition 2.1, we see that the main difference between Hadamard differentiability and directional differentiability lies in the linearity of the derivative. This turns out to be the exact gap between these two notions of differentiability. In particular, (12) ensures that the directional derivative is necessarily continuous and positively homogeneous of degree one, though potentially nonlinear (Shapiro1990).
Given the introduced notions of differentiability and in view of the remarkable fact that Delta method is valid under even Hadamard directional differentiability in terms of deriving asymptotic distributions (Shapiro1991; Dumbgen1993), it seems a natural next step to invoke the Delta method. However, in the presence of first order degeneracy, the resulting limiting distribution is degenerate at zero, rendering substantial challenges for inferential purposes. In essence, the Delta method is a stochastic version of Taylor expansion. Therefore, one could go one step further to explore the quadratic term when the linear term is degenerate. We thus follow Shapiro2000inference and define
Definition 2.2**.**
Let be a map as in Definition 2.1.
- (i)
Suppose that is Hadamard differentiable tangentially to such that the derivative is well defined on . We say that is second order Hadamard differentiable at tangentially to if there is a bilinear map such that: for ,
[TABLE]
for all sequences and such that , as and for all .
- (ii)
Suppose that is Hadamard directionally differentiable tangentially to such that the derivative is well defined on . We say that is second order Hadamard directionally differentiable at tangentially to if there is a map such that:222Compared with Shapiro2000inference, we omitted in the denominator for notational compactness.
[TABLE]
for all sequences and such that , as and for all .
The second order derivative in both cases is necessarily continuous on , which can be shown in a straightforward manner as in the proof of Proposition 3.1 in Shapiro1990. Similar in spirit to Definition 2.1, the key difference between the above two notions of second order differentiability is that the former is a quadratic form corresponding to a bilinear map while the latter is in general only positively homogeneous of degree two, i.e., for all and all . Note that it is possible that is first order Hadamard differentiable but only second order Hadamard directionally differentiable (see Example 2.2). In all our examples, is first order Hadamard differentiable though may be degenerate; see Subsection 2.3.1. We stress that requiring to be well defined on the entirety of does not demand differentiability on . Instead, it just means that can take elements potentially not in as arguments. Finally, we note that first and second order (directional) derivatives share the same domain .
If in turn is degenerate, one can go beyond the second order, a possibility we do not pursue at length in this paper; see Remark 2.1.
Remark 2.1**.**
Suppose that is -th order Hadamard directionally differentiable tangentially to such that the derivative is well defined on for all , where . Then we say that is th order Hadamard directionally differentiable at tangentially to if there is a map such that:
[TABLE]
for all sequences and such that , as and for all . Note that, similar to the treatment of , the factors are incorporated in the definition of the derivatives to reflect the nature of them as approximating maps. Demyanov1974Minimax established the above high order expansion for with and ;333We thank an anonymous referee for bringing this reference to our attention. see also Demyanov2009Minimax. ∎
2.3.1 Examples Revisited
From now on, we shall focus on Examples 2.1 and 2.6 exclusively for conciseness; Examples 2.2, 2.3, 2.4 and 2.5 will be treated in Appendix C.
Example 02.1 (Continued).
In this example, the functional involved is second order Hadamard differentiable. Trivially we have
[TABLE]
Note that the first order derivative is degenerate when , whereas is everywhere nondegenerate. The bilinear map here is given by . ∎
In Example 2.6, the domain of the derivative is a strict subset of .
Example 02.6 (Continued).
Consider such that for some . Then is Hadamard differentiable at and for all . Suppose further that is compact and that is in the interior of . For the space of continuously differentiable functions on , if , then by Lemma C.3, under additional regularity conditions, is second order Hadamard directionally differentiable at tangentially to with the derivative given by: for any ,
[TABLE]
where with J(\gamma_{0})\equiv\frac{d\theta(\gamma)}{d\gamma^{\intercal}}\big{|}_{\gamma=\gamma_{0}} the Jacobian matrix and the identity matrix of size . Here, invertibility of is an implied requirement in Lemma C.3; see Remark C.2. Note that if is point identified, then becomes second order Hadamard differentiable with
[TABLE]
which in turn yields as the asymptotic distribution of the -statistic under optimal weighting. We emphasize that the regularity conditions in Lemma C.3 are sufficient for applying our framework but by no means necessary – as explained in Section 4, those sufficient conditions exclude the setup of Dovonon_Renault2013testing, and so we shall provide an alternative set of sufficient conditions there. ∎
2.4 Second Order Delta Method
The Delta method for potentially directionally differentiable maps as well as differentiable ones has proven powerful in asymptotic analysis (Vaart1998; Shapiro1991; Fang_Santos2014HDD; Hansen2015regression). Unfortunately, it is insufficient to handle substantial challenges for inference arising from first order degeneracy. Heuristically, if and , then the Delta method implies that
[TABLE]
For real-valued , the usual confidence interval for at asymptotic level is
[TABLE]
where the is the -th quantile of and is zero for all . Clearly, if, for example, is a continuous random variable.
To circumvent the above difficulty, we impose the following conditions in order to obtain a suitable second order Delta method.
Assumption 2.1**.**
(i) and are normed spaces with norms and respectively; (ii) is second order Hadamard directionally differentiable at tangentially to ; (iii) for all .
Assumption 2.2**.**
(i) There is such that in for some ; (ii) is tight and its support is in ;444The support of is the set of points in all of whose open neighborhoods have positive probability. (iii) is closed under vector addition, i.e., whenever .
Assumption 2.1 formalizes the requirement that the map be second order Hadamard directionally differentiable at , and the defining feature of this paper, namely, degeneracy of the first order derivative. Assumption 2.2(i) defines another key ingredient: there is an estimator for that admits a weak limit at a potentially non- rate ; see Remark 3.1. Assumption 2.2(ii) ensures that the support of is included in the domain of the derivative so that is well defined, while tightness of is only a minimal requirement. Assumption 2.2(iii) is a mild condition, which shall play a technical role in the proof of our bootstrap results.
Given Assumptions 2.1 and 2.2, we now present a second order Delta method building upon Shapiro2000inference and Romish2004delta but without requiring to be convex.
Theorem 2.1**.**
If Assumptions 2.1(i)(ii) and 2.2(i)(ii) hold, then555The term is interpreted as some continuous extension of (which always exists in our setup) evaluated at whenever ; see the comment preceding the proof of Theorem 2.1. Since (18) is an asymptotic result, the choice of the continuous extension is irrelevant.
[TABLE]
and hence
[TABLE]
The essence of Theorem 2.1 is in complete accord with that underlying the first order Delta method. In particular, the definition of second order Hadamard directional differentiability is engineered so that the second order Delta method is nothing more than a stochastic version of the Taylor expansion of order two, i.e.,
[TABLE]
where corresponds to , and to . Note that Theorem 2.1 is valid regardless of the nature of the differentiability (i.e., fully differentiable or directionally differentiable) and the presence of first order degeneracy. When is degenerate, the convergence (19) simplifies to
[TABLE]
Finally, we note that higher order versions of the Delta method can be developed along the lines of Remark 2.1; see Remark 2.2.
Remark 2.2**.**
Suppose that Assumptions 2.1(i) and 2.2(i)(ii) hold and is -th order Hadamard directionally differentiable at tangentially to . It follows that
[TABLE]
and hence
[TABLE]
3 The Bootstrap
Establishing asymptotic distributions as in Theorem 2.1 is the first step towards conducting statistical inference on , the usefulness of which relies on our ability to accurately estimate the limiting law. In this section, we discuss how first order degeneracy of can complicate inference using the standard bootstrap based on first and especially second order asymptotics, and provide alternative consistent resampling schemes.
3.1 Bootstrap Setup
Throughout, we let denote a “bootstrapped version” of , which is defined as a function mapping the data and random weights that are independent of into the domain of . This general definition allows us to include diverse resampling schemes such as nonparametric, Bayesian, block, score, more generally multiplier and exchangeable bootstrap as special cases. Next, making sense of bootstrap consistency necessitates a metric that quantifies distances between probability measures. As is standard in the literature, we employ the bounded Lipschitz metric formalized by Dudley1966Baire; Dudley1968distance: for two Borel probability measures and on , define
[TABLE]
where we recall that denotes the set of Lipschitz functionals whose absolute level and Lipschitz constant are bounded by one, i.e.,
[TABLE]
Since weak convergence in the Hoffmann-Jørgensen sense to separable limits can be metrized by (Dudley1990nonlinear; Vaart_Wellner1990prohorov), we may now measure the distance between the “conditional law” of given and the limiting law of by
[TABLE]
where denotes expectation with respect to the bootstrap weights holding the data fixed. Employing the distribution of conditional on the data as an approximation to the distribution of is then asymptotically justified if their distance, equivalently (21), converges in probability to zero.
We formalize the above discussion by imposing the following assumptions on .
Assumption 3.1**.**
(i) with independent of ; (ii) satisfies .
Assumption 3.2**.**
(i) for all where and denote minimal measurable majorant and maximal measurable minorant (with respect to jointly) respectively; (ii) is a measurable function of outer almost surely in for any continuous and bounded .
Assumption 3.1(i) formally defines the bootstrap analog of , while Assumption 3.1(ii) simply imposes the consistency of the “law” of conditional on the data for the law of , i.e., the bootstrap “works” for the estimator . Assumption 3.2 is of technical concern. In particular, Assumption 3.2(i) can often be established as a result of bootstrap consistency (Vaart1996), while Assumption 3.2(ii) is easy to verify for particular resampling schemes. For example, if is continuous, then Assumption 3.2(ii) is fulfilled. When is Euclidean-valued, i.e., with , one can dispense with Assumption 3.2.
3.2 Failures of the Standard Bootstrap
We now turn to the challenges for inferences using the standard bootstrap caused by first order degeneracy. As is well known in the literature, the law of
[TABLE]
conditional on the data provides a consistent estimator of the law of provided is Hadamard differentiable (Vaart1996), which in particular includes the case when . In other words, the standard bootstrap, meaning the law of (22) conditional on the data, is consistent for the law of regardless of the presence of first order degeneracy.
Substantial difficulties, however, arise from using (22) for inferential purposes when first order degeneracy does occur. Ignoring the first order degeneracy or perhaps as a way to avoid ridiculous confidence intervals such as (17), one might consider the following confidence interval for real-valued :
[TABLE]
where is the -th bootstrapped quantile for defined as
[TABLE]
However, establishing the validity of (23) as a level confidence interval for is problematic because for all and [math] is a discontinuity point of the cdf of the limit (see Lemma B.1).
In fact, simple algebra reveals that (23) is numerically identical to
[TABLE]
where is defined as
[TABLE]
In other words, is the -th bootstrapped quantile of the standard bootstrap based on second order asymptotics:
[TABLE]
As illustrated by Babu1984bootstrap for the squared mean example, the conditional law of (25) is inconsistent for the law of when , the point at which first order degeneracy arises. We next demonstrate that the bootstrap failure in this simple example is a reflection of a deeper principle: the second order standard bootstrap is consistent if and only if is degenerate, under regularity conditions.
Theorem 3.1**.**
Suppose that Assumptions 2.1, 2.2, 3.1 and 3.2 hold, and that is centered Gaussian. Then on the support of if and only if
[TABLE]
If, in addition, is second order Hadamard differentiable, then the conclusion holds without requiring to be centered Gaussian.
The sufficiency part of the theorem is somewhat expected and not a deep result, while the necessity is perhaps surprising and has far-reaching implications for statistical inference as we shall detail shortly. The proof of the latter consists of two steps: in the first step, we show that bootstrap consistency as in (26) implies existence of a bilinear map corresponding to , in similar fashion as the proof of Theorem 3.1 in Fang_Santos2014HDD; in the second step, we establish that and hence is necessarily degenerate. Both steps involve the insights of equating distributions through their characteristic functionals as in Vaart1991differentibility and Hirano_Porter2012.
Theorem 3.1 implies that, in the presence of first order degeneracy, if the second order derivative is nondegenerate, then the standard bootstrap based on second order asymptotics is necessarily inconsistent whenever is centered Gaussian. If is degenerate, we have a degenerate limiting distribution that can not be directly used for inference. We thus conclude that bootstrap failure is an inherent implication of models with first order degeneracy.
Heuristically, the reason why the standard bootstrap fails is that even though in the “real world”, its bootstrap counterpart is non-negligible. To see this, consider the squared mean example. If , then
[TABLE]
This is an emphatic reflection of Efron1979’s caveat that the bootstrap, as well as other resampling schemes, provides frequency approximations rather than likelihood approximations. These heuristics suggest that the standard bootstrap might work if the first order term is included, which turns out to be true for sufficiently smooth maps; see Theorem 3.2.
It is worth noting that Theorem 3.1 holds even if is smooth. Consequently, first order degeneracy is a source of bootstrap inconsistency completely different from that discussed in Fang_Santos2014HDD, i.e., nondifferentiability of . In addition, we note that, without the qualifier that is centered Gaussian, bootstrap consistency (26) holds if and only if for all under mild support conditions; see Theorem A.1 in Fang_Santos2014HDD.
Finally, to further articulate the relations between the current work and that of Fang_Santos2014HDD, we present a table that describes the scopes we work in.
3.3 The Babu Correction
We now extend the Babu correction under our more general setup. We proceed by imposing the following assumption.
Assumption 3.3**.**
(i) The map is second order Hadamard differentiable at tangentially to ; (ii) is first order Hadamard differentiable at every point in some neighborhood of tangentially to such that 666The appearance of the factor 2 is due to omission of the factor in Definition 2.2.
[TABLE]
for all sequences and such that , as and for all sufficiently large , where is the bilinear map underlying .
Assumption 3.3(i) defines the scope of the Babu correction: it shall be applied to smooth maps, which excludes, for example, the functional associated with the -test in GMM settings when first order or global identification fails – see Section 4. Assumption 3.3(ii) is stronger than being simply second order Hadamard differentiable, in that it requires the existence of first order derivative at all points in a neighborhood of such that (3.3) holds. Assumption 3.3 is fulfilled for the setup considered in Babu1984bootstrap and for Examples 2.1 and 2.3, but violated for the remaining examples.
Under Assumption 3.3, the corrected bootstrap
[TABLE]
is consistent for the law of regardless of the degeneracy of .
Theorem 3.2**.**
If Assumptions 2.1(i)(ii), 2.2, 3.1, 3.2 and 3.3 hold, then
[TABLE]
Theorem 3.2 generalizes Babu1984bootstrap considerably in that it accommodates semiparametric and nonparametric models, and allows wider resampling schemes beyond the nonparametric bootstrap of Efron1979. The Babu correction works nicely with smooth maps in the sense of Assumption 3.3, but unfortunately is inadequate to handle nonsmooth ones. This is because when is only second order directionally differentiable, often times the derivative is not “continuous” in , implying that the Babu correction (28) is unable to estimate properly and in this way results in inconsistent estimates. For this reason, we next provide yet another resampling method which accommodates (second order) nondifferentiable maps.
3.4 A Modified Bootstrap
In this subsection, we shall present a modified bootstrap following Fang_Santos2014HDD that is consistent for the law of , and adaptive to both the presence of first order degeneracy and nondifferentiability of .
The heuristics underlying our proposal, however, are connected to those in Fang_Santos2014HDD in a subtle way. In the context of first order asymptotics where is only directionally differentiable, inconsistency of the standard bootstrap arises from its inability to properly estimate the directional derivative . In our setup, however, there are examples in which the derivative is a known map; see Examples 2.1 and 2.3 which are all differentiable maps. The standard bootstrap in these settings fails because there is a non-negligible term being neglected. However, in all other examples where is not smooth enough, Fang_Santos2014HDD’s arguments will come into play as well.
In any case, the second order weak limit is a composition of the derivative and the limit of , as is the first order limit . Thus, the law of can be estimated by composing a suitable estimator for with a consistent bootstrap approximation for the law of , in exactly the same fashion as the resampling scheme proposed by Fang_Santos2014HDD. That is, we propose employing the law of
[TABLE]
conditional on the data as an approximation for the law of , where is a suitable estimator of . Certainly, we would like to converge to in some sense as . This can be made precise as follows.
Assumption 3.4**.**
* is a function of satisfying that for every sequence and every such that as ,*
[TABLE]
Assumption 3.4 says that converges in probability to along any convergent sequence as . In cases when is a known map, we may simply set for all . It is worth noting that Assumption 3.4 is equivalent to requiring: for every compact set and every ,
[TABLE]
where ; see Lemma B.2. Condition (32) was employed in Fang_Santos2014HDD who also provided several sufficient conditions for it to hold. For example, if is Lipschitz continuous, then pointwise consistency of suffices for (32). Unfortunately, second order derivatives often lack uniform continuity and hence those sufficient conditions are inapplicable. Nonetheless, condition (31) is straightforward to verify in all our examples.
Given the equivalence of conditions (31) and (32), consistency of our modified bootstrap (30) follows from Theorem 3.2 in Fang_Santos2014HDD.
Theorem 3.3**.**
Under Assumptions 2.1(i)(ii), 2.2, 3.1, 3.2 and 3.4, it follows that
[TABLE]
Theorem 3.3 shows that the law of conditional on the data is indeed consistent for the law of , regardless of the degree of smoothness of and degeneracy of . Interestingly, the resampling scheme in Theorem 3.3 is a mixture of the classical bootstrap and analytical asymptotic approximations. Finally, we note that Assumption 3.4 allows us to think of Theorem 3.3 as a variant of the extended continuous mapping theorem.
Theorems 3.2 and 3.3 are useful for hypothesis testing. Specifically, consider
[TABLE]
Under first order degeneracy, as is the case in all our examples, we employ the test of rejecting if where is the critical value constructed from the Babu correction or our proposed bootstrap, i.e.,
[TABLE]
or
[TABLE]
Note that is generally infeasible but can be estimated by Monte Carlo simulations (Efron1979; Hall1992bootstrap; HorowitzBoot). The pointwise size control of our test then follows according to Theorems 3.2 and 3.3. In fact, under additional restrictions, it can provide local size control. This property is particularly attractive because of the irregularity arising from nondifferentiability of . In this case, pointwise asymptotic approximations can be misleading (Imbens_Manski; AndrewsandGuggen2009ETA). Interestingly, it turns out that there is another source of irregularity due to the nature of first order degeneracy (see Lemma A.1). We relegate the detailed discussions to Appendix A in order to make our presentation concise.
We now briefly compare the Babu correction, the above composition procedure and the recentered bootstrap (Hall_Horowitz1996bootstrap; HorowitzBoot). In some cases (for instance, Example 2.1 and the regular -test), they coincide with each other. However, the Babu correction applies to general smooth functionals, rather than just quadratic forms, and hence can be thought of as a generalization of the recentered bootstrap. The composition procedure, which works for an even larger class of functionals, is a direct approach by exploiting the structure of the limits, and hence is more tractable.
Remark 3.1**.**
Examples where the convergence rate is not include inference based on kernel estimators with undersmoothing (Hall1992bootstrap), smoothed maximum score estimators (Horowitz2002maxscore), and cointegration regressions (ChangParkSong2006BootCoint). For nonstandard convergence rates, however, the bootstrap process can fail to consistently estimate the law of , violating Assumption 3.1(ii). Fortunately, as far as Theorem 3.3 is concerned, any consistent estimator, which need not satisfy Assumption 3.1(ii), will do. For example, in cube-root estimation problems, one could instead employ some smoothed bootstrap where and are some smoothed estimators, or out of resampling (or subsampling) where is a bootstrap estimator based on subsamples of size . In the context of estimating nonincreasing density functions, see Kosorok2008Grenander and Sen_Banerjee_Woodroofe2010; for bootstrapping the maximum score estimators, see Delgado_Poo_Wolf2001 and Patra_Seijo_Sen2015.∎
3.5 Estimation of the Derivative
Given the posited bootstrap consistency for the law of , the remaining crucial piece towards consistent bootstrap for the law of based on Theorem 3.3 is then an estimator of the derivative that satisfies Assumption 3.4. There are two general approaches for estimation of : one by exploiting the structure of , and the other one based on numerical differentiation as we describe now.
When first order degeneracy occurs, we have
[TABLE]
We may thus estimate via numerical differentiation as follows: for any ,
[TABLE]
If tends to zero at a suitable rate, the sense of which is made precise by the following assumption, then is a good estimator for in the sense of Assumption 3.4.
Assumption 3.5**.**
* is a sequences of scalars such that and .*
Assumption 3.5 allows a wide range of tuning parameters that can deliver first order validity of our method. The optimal choice of is challenging and beyond the scope of the present paper, which we hope to address in future. The next proposition confirms the validity of the numerical estimator (38).
Proposition 3.1** (Hong_Li2015numericaldelta).**
If Assumptions 2.1, 2.2(i)(ii), and 3.5 hold, then the numerical estimator in (38) satisfies Assumption 3.4.
The numerical differentiation approach of estimating the derivatives, in the context of the Delta method, dates back to at least Dumbgen1993 in his proposal of the rescaled bootstrap. However, the way it was presented is quite implicit in revealing this, and so the bootstrap procedure is sometimes misunderstood as the out of resampling. Effectively, the rescaled bootstrap amounts to estimating the derivative numerically and the law of using bootstrap samples; see Beare_Fang2016Grenander for more details. The recent work of Hong_Li2015numericaldelta provided a range of extensions of the numerical Delta method that have wide applications in econometrics.
Proposition 3.1 provides a way of estimating the derivative that is tractable in the sense that there is no need to explore the particular structures of or as long as the tuning parameter is properly chosen. On the other hand, the expression of itself often suggests an intuitive estimator as we elaborate in the next subsection.
3.5.1 Examples Revisited
Examples 2.1 is trivial since is a known map and hence one can simply set for all . Example 2.6 is more complicated.
Example 02.6 (Continued).
In the classical case when is singleton, we may estimate based on the GMM estimator and the estimated Jacobian matrix . Generally, there are two unknown objects involved in the second order derivative: the identified set and . Let be the space of matrices. Suppose that is a -consistent estimator for , and an estimator for such that . Then we may estimate by
[TABLE]
where for satisfying . Consistency of can be established by appealing to CHT2007, while uniform consistency of can be derived using Glivenko-Cantelli type arguments. Following the proof of Lemma D.3, it is straightforward to show that satisfies Assumption 3.4. ∎
4 Application: Testing for Common CH Features
In this section, we apply our framework to develop a robust test of common conditionally heteroskedastic (CH) factor structure by allowing multiple common CH features. Let be a -dimensional time series. According to Engle_Kozicki1993CF, a feature that is present in each component of is said to be common to if there exists a linear combination of that fails to have the feature. A canonical example is the notion of cointegration developed by Engle_Granger1987Co-In in order to characterize the common feature of stochastic trend.
4.1 The Setup
Following Engle_Ng_Rothschild1990asset and Dovonon_Renault2013testing, suppose that the -dimensional process satisfies
[TABLE]
where is a matrix of full column rank with , a diagonal matrix with diagonal (random) elements for , a positive semidefinite matrix, and a filtration to which and are adapted. By Engle_Kozicki1993CF, we say that has a common CH feature if there exists some nonzero such that is constant. The conditional covariance structure (40) has some attractive properties that help to understand, for example, asset excess returns in a parsimonious way (Engle_Ng_Rothschild1990asset). Thus, tests of common CH features can be used to detect the underlying common factor structures that simplify capturing interrelations of economic and financial variables under consideration.
With the help of instrumental variables, a common CH feature can be reformulated by unconditional moments that fit into the classical GMM framework. The following assumption is taken directly from Dovonon_Renault2013testing.
Assumption 4.1**.**
(i) is of full column rank; (ii) is nonsingular for ; (iii) ; (iv) is an -measurable random vector such that is nonsingular; (v) has full column rank ; (vi) is stationary and ergodic such that and .
Assumption 4.1(i)-(ii) ensure that there are exactly linearly independent vectors , spanning the null space of , such that is constant. In other words, the common CH features are nonzero solutions of the equation .777If is a common CH feature, so is for any nonzero . For mathematical purpose, however, the number of common CH features is defined to the dimension of the null space of \Lambda^{\text{\scalebox{0.7}{\intercal}}}. Assumption 4.1(iii) is a normalization condition that helps to simplify the exposition. Assumption 4.1(iv) defines the instrument formed from the information set , while Assumption 4.1(v) implicitly requires that the number of instruments is no less than that of factors. Assumption 4.1(vi) further specifies the data generating process. We refer the readers to Dovonon_Renault2013testing for further details on Assumption 4.1.
Assumption 4.1 allows us to characterize common CH features as nonzero satisfying the vector of unconditional moment equalities (Dovonon_Renault2013testing):
[TABLE]
where . It is then tempting to employ Hansen’s statistic to test the existence of common CH features (Engle_Kozicki1993CF). Unfortunately, as noted by Dovonon_Renault2013testing, the Jacobian matrix evaluated at the truth is degenerate at zero, rendering standard theory inapplicable. Though, as shall be illustrated, such degeneracy is of a nature different from first order degeneracy. By expanding the moment function to the second order, Dovonon_Renault2013testing showed that the asymptotic distribution of the statistic is highly nonstandard. Nonetheless, Dovonon_Goncalves2017bootstrapping developed a corrected bootstrap that can consistently estimate the limiting law when the bootstrap of Hall_Horowitz1996bootstrap fails to do so.
However, a key assumption in previous studies is that there exists a unique nonzero such that (41) is satisfied, ensured by exclusion restrictions and linear normalization (Dovonon_Renault2013testing; Dovonon_Goncalves2017bootstrapping; Lee_Liao2017LocalIDfailure). This is undesirable for the following reasons. First, it is unknown a priori how many (linearly independent) CH features are common to the series under consideration. Second, as pointed out by Engle_Ng_Rothschild1990asset in the context of asset pricing, empirical work often considers large numbers of assets and the numbers of common CH features are expected to be large as well. Third, the linear normalization may in fact lead to no satisfying (41) (i.e. non-existence). For example, suppose . Then any common CH feature must satisfy , contradicting the linear normalization proposed in Dovonon_Renault2013testing. Fourth, in addition to the possibility that exclusion restrictions may be hard to form, the linear normalization is not susceptible of a unique common CH feature (i.e. non-uniqueness). To see this, suppose . Then for any common CH feature satisfying the normalization, we must have and , which admit infinitely many solutions, i.e., the uniqueness is undermined in this case. These arguments motivate us to modify the -test in a way that accommodates partial identification as well as degenerate Jacobian matrices. Such an extension is nontrivial because the second order (and hence global) identification,888Given first order identification failure, second order identification is equivalent to global identification in the current context because the moment function is quadratic in . a condition that Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping heavily rely on, fails.
4.2 A Modified Test
To exclude the zero solution and avoid falsely excluding the existence of CH features, we employ the following normalization
[TABLE]
Next, to map the current setup into our developed framework, we define a function by: for any ,
[TABLE]
Then in view of the moment conditions (41), the hypothesis that there exists at least one common CH feature can be reformulated as
[TABLE]
where is defined as . In this formulation, we have taken the identity matrix as the weighting matrix for simplicity.
Given our treatment of Example 2.6, one might next try appealing to the results developed there. Unfortunately, they are not directly applicable. First, the parameter space of is required to have nonempty interior (see Lemma C.3), whereas in the current context which has empty interior. Second, there is a technical condition there that prevents the Jacobian matrix from being degenerate even when there does exist a unique common CH feature; see Remark C.2 for details. Consequently, we have to re-verify the differentiability conditions for the map (43). By Lemma D.1, under the null, is Hadamard differentiable with degenerate derivative, and second order Hadamard directionally differentiable at tangentially to with the derivative
[TABLE]
for any , where is the identified set of , and with the th row given by and
[TABLE]
We now make some remarks before proceeding further. First, we stress that first order degeneracy refers to the first order derivative of the functional , mapping from the function space to , being degenerate, while the degeneracy Dovonon_Renault2013testing focused on refers to degeneracy of the Jacobian matrix of the moment function that maps from the parameter space of to . Thus, the two types of degeneracy are conceptually different. Second, perhaps more importantly, they are also different in terms of the consequences. By Theorem 3.1 and in view of (45), being first order degenerate means that the second order standard bootstrap is inconsistent regardless of whether the Jacobian matrix is degenerate or not, while degeneracy of the Jacobian matrix generates the additional complication that is second order nondifferentiable as reflected by the inside minimization in (45). Third, further allowing multiple (linearly independent) common CH features reinforces the nondifferentiability of as can be seen from the outside minimization in (45).
Next, let the estimator be defined by with . Given the established differentiability of , the asymptotic distribution of is then an immediate consequence of Theorem 2.1 provided converges weakly. Towards this end, we impose the following assumption as in Dovonon_Renault2013testing.
Assumption 4.2**.**
, and fulfill CLT.999The symbol denotes Kronecker product.
Assumptions 4.1 and 4.2 together imply that
[TABLE]
where is a zero mean Gaussian process with the covariance functional satisfying: for any , and ,
[TABLE]
The proposition below delivers the limiting distribution of test statistic .
Proposition 4.1**.**
Let Assumptions 4.1 and 4.2 hold. Then we have under
[TABLE]
The asymptotic distribution in (47) is a highly nonlinear functional of the Gaussian process in general, which turns out to be consistent with the limits obtained in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping whenever their second order identification (and global) condition holds; see Remark 4.1. In the latter setting, Dovonon_Goncalves2017bootstrapping showed that the recentered bootstrap of Hall_Horowitz1996bootstrap is inconsistent and thus proposed corrected versions of the standard GMM bootstrap. Unfortunately, their methods are not directly applicable to our setup that allows multiple common CH features (i.e. partial identification), because they crucially rely on the second order and global identification.
We next demonstrate how our bootstrap works. First, let be a bootstrap sample, which can be obtained by block bootstrap, nonoverlapping or overlapping (Carlstein1986subseries; Kunsch1989Jackknife). Because the limiting process is determined by a martingale difference sequence indexed by , the dependence structure of the data does not enter into the limit and we may thus employ Efron1979’s nonparametric bootstrap or more general bootstrap schemes. In any case, we set
[TABLE]
To accommodate diverse resampling schemes, we simply impose the high level condition that satisfies Assumptions 3.1 and 3.2 (DehlingMikoschSorensen2002EPDep).
It remains to estimate the derivative (45). The numerical differentiation approach can be implemented as in the beginning of Section 3.5. That is, we estimate by
[TABLE]
where satisfies Assumption 3.5. We now describe how to estimate by exploiting its structure. Let and ,101010One can theoretically ignore in the expression of . As pointed out by CHT2007, however, such a modification helps avoid an empty set of solutions and improve power. where is to be specified. Then we may estimate by:
[TABLE]
where with its th row given by for
[TABLE]
In fact, we may further restrict the bounded set to reduce the computation burden for ; see Remark D.1. Clearly, the sequence should tend to zero at a suitable rate as . This is made precise as follows.
Assumption 4.3**.**
* satisfies (i) , and (ii) .*
Assumption 4.3 regulates the rates at which the tuning parameters should approach zero, in order to deliver first order validity of our bootstrap inference procedure. The optimal choice of is concerned with higher order accuracy of our method, which we do not touch in this paper. Combining the bootstrap in (48) and the derivative estimator, we are then able to consistently estimate the law of the weak limit in (47) following Theorem 3.3, which in turn allows us to construct critical values. Specifically, let be the quantile of conditional on the data:111111As usual, denotes the probability taken with respect to the bootstrap weights , though in the current setup they are implicitly defined. Alternatively, one can think of as the probability with respect to the bootstrap sample holding data fixed.
[TABLE]
The following proposition confirms that the test of rejecting the existence of common CH features when is valid.
Proposition 4.2**.**
Suppose Assumptions 3.1, 3.2, 4.1, 4.2, and 4.3 hold. If the cdf of the limit in (47) is continuous and strictly increasing at its quantile for , then we have under ,
[TABLE]
Proposition 4.2 implies our test has pointwise asymptotic exact size and thus is not conservative (in the pointwise sense). Establishing local size control, unfortunately, is challenging in this case, because asymptotic distributions of the statistic under local perturbations do not have definitive relations (to us) to the corresponding pointwise limits in terms of first order dominance. It appears that the problem of developing (at least) locally valid and non-conservative overidentification tests is prevalent in the literature of partial identification (CHT2007; AndrewsandSoares2010).
Finally, we stress that the quadratic structure of the moment function plays no essential roles in our framework. Building upon Example 2.6, one may work with a general moment function that admits a zero Jacobian matrix, but without the requirement that the parameter space have nonempty interior. It is also possible to deal with GMM problems with a rank deficient but possibly nonzero Jacobian matrix. For example, consider testing whether a matrix with has rank . This amounts to testing
[TABLE]
Here, the moment function is which is non-quadratic and whose Jacobian matrix, namely, , may have rank less than or equal to . Note also that the parameter space of has empty interior. We refer the reader to ChenFang2016Rank for more detailed discussions.
Remark 4.1**.**
The weak limit in Proposition 4.1 is consistent with the one in Dovonon_Renault2013testing, when there does exist a unique common CH feature which satisfies their linear normalization and when the weighting matrix is the identity matrix (for reasons we have mentioned at the beginning of this section) – otherwise the two are not comparable. At the first sight, our testing statistic is different from Dovonon_Renault2013testing’s because we adopted a different normalization, resulting in a different parameter space.121212Dovonon_Renault2013testing also recentered in their construction, though this does not change the statistic numerically. Close inspection, however, shows that the asymptotic distributions are in fact identical, up to a multiplicative constant. Specifically, let be the (nonzero) unique CH feature such that . Then and so by Proposition 5.1, the asymptotic distribution of our -statistic is simply the law of
[TABLE]
where we simply replaced with . By Theorem 3.1 and Corollary 3.1 in Dovonon_Renault2013testing– see also Dovonon_Goncalves2017bootstrapping, their -statistic (with being the identity matrix) converges in law to
[TABLE]
where with the th row for and the vector of ones. By Lemma D.6, however, the two limits in (53) and (54) differ only by the multiplicative constant , establishing the claimed consistency. If the common CH feature also satisfies our normalization, i.e., , then the two limits are identical. We reiterate that the our main motivation is to build upon Dovonon_Renault2013testing by allowing multiple common CH features and adopting a normalization that would not falsely exclude the existence of any common features.131313Any other linear normalization for known and would share the same deficiency as the linear normalization, which includes, for example, – see our next section. ∎
4.3 Simulation Studies
In this section, we examine the finite sample performance of our framework based on Monte Carlo simulations, and show how the identification assumption in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping may suffer from their linear normalization. One may then try the multiple testing versions of these tests by testing a few linearly independent linear restrictions, but we show they may be too conservative.
As in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping, we consider the following CH factor model:
[TABLE]
where is a vector that can be thought of asset returns, is a vector of CH factors, is a matrix of factor loadings, and is a vector of idiosyncratic shocks independent of . Following Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping, we let be an i.i.d. sequence from , and the th component of follow a Gaussian-GARCH(1,1) model such that
[TABLE]
where , i.i.d. across both and , and are independent across and of . It follows that are independent across for each . The remaining specifications are detailed in Table 2. Our designs are the same as those in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping except that different values for are used to illustrate the restrictiveness of the linear normalization. Designs D1 and D2 generate two assets while Designs D3, D4 and D5 generate three assets. In Designs D1, D3 and D4, the factor loading matrices ensure the existence of common CH features and thus serves for investigation of size performance, while no common CH features exist in Designs D2 and D5, which help us inspect power performance.
The tests are implemented with and instruments for Designs D1 and D2, and with and for Designs D3, D4 and D5. For derivative estimation, we set the tuning parameters for both the derivative estimator in (50) and the numerical derivative estimator as in (49) respectively. These choices are meant to satisfy Assumption 4.3. Again, we do not touch the issue of optimality in this paper, but instead hope to make the point that, even with these crude choices, our methods show substantial improvement over existing ones. The results corresponding to the two sets of choices are denoted as CF1 and CF2. To show the restrictiveness of the linear normalization as in Dovonon_Renault2013testing, Dovonon_Goncalves2017bootstrapping and Lee_Liao2017LocalIDfailure, we report the results based on Dovonon_Goncalves2017bootstrapping’s corrected and continuously-corrected bootstrap as well as those based on the asymptotic test of Dovonon_Renault2013testing, denoted as DG1, DG2 and DR respectively. The sample sizes are , , , and . To minimize the initial value effect, the data are obtained by generating samples and dropping the first samples. We conduct Monte Carlo replications with empirical bootstrap repetitions for each replication. The nominal level is throughout.
The results are summarized in Tables 3-7. As expected, Dovonon_Goncalves2017bootstrapping’s resampling methods exhibit substantial size distortion, often close to or over ; so does the asymptotic test DR. This does not appear to be a finite sample issue as the distortion is especially severe in large samples. Rather, it is because the linear normalization excludes common CH features that actually exist in the data and in this way leads to wrong conclusions. Our tests considerably reduce the null rejection rates for all the chosen tuning parameters, though both CF1 and CF2 exhibit some degrees of over- and under-rejection, due to the issue of tuning parameters. Another interesting finding is that our bootstrap based on numerical differentiation (CF2) appears to be more sensitive to the choice of tuning parameters, which is somewhat expected because the structural method (CF1) exploits more information of the derivative. We leave a thorough comparison between these two methods for future study.
Alternatively, one may test a few linearly independent linear restrictions by adopting multiple testing versions of the DG and the DR tests, so as to avoid falsely excluding the existence of common CH features. One then rejects the existence of common CH features if all the restrictions are rejected at level .141414Since the null is a union of “sub-nulls”, no Bonferroni-type correction is needed. However, the resulting tests, though valid, may be too conservative. To illustrate, we test the null that satisfies (i) or (ii) for D1 and D2, and satisfies (i) , (ii) , or (iii) for D3, D4 and D5. We implement the multiple testing procedures based on Dovonon_Renault2013testing with optimal weighting matrix and Dovonon_Goncalves2017bootstrapping with the identity weighting matrix, and respectively label them as M-DG1, M-DG2 and M-DR. As expected, the M-DR test suffers from substantial under-rejection for D1, D3 and D4 even in large samples. M-DG1 and M-DG2 improve the situation somewhat, but the under-rejection is still significant for D3. Tables 6 and 7 indicate that our tests are more powerful than M-DG1, M-DG2 and M-DR in all cases. In particular, for D5 the rejection rates of our tests are close to one when is large while those of M-DG1 and M-DG2 are not. Results for multiple testing procedures based on Dovonon_Goncalves2017bootstrapping with optimal weighting matrix share similar patterns and are available upon request. We reiterate that the multiple testing procedure would not help with partial identification, and both Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping crucially rely on point identification.
5 Conclusion
In this paper, we developed a general statistical framework for conducting inference on functionals exhibiting first order degeneracy, i.e., the first order derivative of the parameter is zero. Our first contribution implies that the standard bootstrap necessarily fails to work in these settings. In light of this failure, we provided two general solutions: one generalizes the Babu correction, and the other one is a modified bootstrap following Fang_Santos2014HDD. Our framework includes many existing results as special cases. To further demonstrate the applicability of our theory, we developed a test of common CH features studied by Dovonon_Renault2013testing but under weaker assumptions that allow the existence of more than one common CH features.
References
\EdefEscapeHex
title1.1title1.1\EdefEscapeHexAppendix TitleAppendix Title\[email protected]\hyper@anchorend Online Supplemental Appendix to “Inference on Functionals under First Order Degeneracy”
Qihui Chen
School of Management and Economics
The Chinese University of Hong Kong, Shenzhen
[email protected] Zheng Fang
Department of Economics
Texas A&M University
The following list includes notation that will be used throughout the supplement.
Appendix A Local Analysis
In this appendix, we show how our bootstrap procedures can provide local size control. We start by characterizing local perturbations of the data generating process and their implications for the testing statistic .
A.1 Local Perturbations
We first introduce relevant concepts following BKRW993Efficient. In what follows we specialize our setup to the the i.i.d. setting for simplicity.151515Generally, we may consider models that are locally asymptotically quadratic (Vaart1998; Ploberger_Phillips2012optimal). In particular, the data is presumed to have a common probability measure , where is a collection of Borel probability measures that possibly generate the data. Further, we think of the parameter as a map , i.e., . Formally, we impose the following:
Assumption A.1**.**
(i) is an i.i.d. sequence with each distributed according to ; (ii) for some known map and .
Given the model defined in Assumption A.1, we now formalize the notion of local perturbations to the true probability measure . Intuitively, a local perturbation can be thought as a sequence of probability measures contained in that approaches . Since the set of probability measures is not a vector space, an appropriate embedding is needed to make precise sense of this idea. This is simplified by considering one dimensional parametric models containing and contained in (Stein1956efficient).
Definition A.1**.**
A function mapping a neighborhood of zero into is called a differentiable path passing through if and for some ,
[TABLE]
Intuitively, a differentiable path is just a parametric model in and indexed by such that it is getting close to sufficiently fast as . The function is referred to as the score function of and satisfies and .
The perturbations on are fundamental in that they affect everything that is built on the model, which in particular includes the parameter and the estimator . In this paper, we shall only consider and that are well behaved with respect to these local perturbations. This is formalized by the following assumption.
Assumption A.2**.**
(i) For every differentiable path in with score function , is regular in the sense that there exists such that (as ); (ii) is a regular estimator for .161616Formally, is a regular estimator if for every differentiable path in with score function , we have , where and denotes the law under .
Assumption A.2(i) is a smoothness condition on the parameter and the model , which rules out parameters defined by, for example, densities or conditional densities with jumps (Ibragimov_Hasminskii1981; Chernozhukov_Hong2004nonregular). In our examples, takes the form of expectations, so Assumption A.2(i) is met under standard conditions as long as the model is sufficiently rich to include differentiable paths (BKRW993Efficient; Brown_Newey1998expectation). Assumption A.2(ii) means that is asymptotically invariant to local perturbations, excluding superefficient estimators such as Hodges’s estimator or Stein’s estimator (Vaart1997Superefficiency). Since are population means in all our examples, Assumption A.2(ii) is satisfied if we take to be the corresponding sample averages; see, for example, Theorem 3.10.12 in Vaart1996 and Jeganathan1995LANtimeseries. Assumption A.2(i) and (ii) in fact are closely related, though themselves alone do not imply one another. In particular, regularity of plus a mild condition implies regularity of , and vice versa (Vaart1991differentibility; Hirano_Porter2012).
The local behaviors of our test statistic can now be characterized as follows.
Lemma A.1**.**
Let be a differentiable path with score function . Suppose that Assumptions 2.1, 2.2, A.1 and A.2 hold. Then,
[TABLE]
where denotes the law under with by abuse of notation.
Lemma A.1 indicates that the asymptotic distribution of varies as a function of the score , and in this sense exhibits second order irregularity, even if the map is both first and second order differentiable and is regular. This is perhaps surprising ex ante and yet somewhat expected ex post. One important implication of Lemma A.1 is that one should carefully evaluate how sensitive the statistical procedures under consideration is, in the presence of first order degeneracy.
A.2 Local Size and Power
Having derived the asymptotic distributions of under local perturbations, we are now in a position to establish local power performance and local size control of our test. We consider differentiable paths in that also belong to the set
[TABLE]
Thus, a path is such that satisfies the null hypothesis whenever , but switches to satisfying the alternative hypothesis at all . One can think of as a simple device to study local size and power in a compact way. Further, we denote the power function at sample size for the test that rejects whenever by
[TABLE]
where we write and . The following additional assumption ensures local size control of our test.
Assumption A.3**.**
(i) ; (ii) The cdf of is strictly increasing and continuous at its -th quantile ; (iii) There exists a strictly increasing function such that and is subadditive.
Assumption A.3(i) formalizes the requirement that be scalar valued. Assumption A.3(ii) requires strict monotonicity of the cdf of at which ensures consistency of the critical value , and continuity which ensures the test controls size at least pointwise in . Subadditivity of as required in Assumption A.3(iii) is crucial for establishing local size control of our test. This condition was imposed directly on the first order derivative in Fang_Santos2014HDD. In our setup, itself often violates subadditivity because it is closely related to quadratic forms. Nonetheless, in all but Example 2.6, is subadditive for given by .171717For Example 2.6, it turns out that is subadditive when is point identified, though the main motivation for us being general there is to accommodate partial identification as well as the Jacobian matrix being degenerate.
The following theorem derives the asymptotic limits of the power function .
Theorem A.1**.**
Let Assumptions 2.1, 2.2, 3.1, 3.2, 3.4, A.1, A.2 and A.3(i)(ii) hold. It then follows that for any differentiable path in with score function , and every we have
[TABLE]
If in addition Assumption A.3(iii) also holds, then we can conclude that for any
[TABLE]
The first claim of the theorem establishes a lower bound for the power function under local perturbations to the null which includes in particular local alternatives. In fact, the lower bound is sharp whenever is a continuity point of the cdf of , in which case (A.3) holds with equality. The role of Assumption A.3(iii) can be seen from (A.3) and the inequalities
[TABLE]
where the second equality is due to and .181818This is because by Assumption 2.1 and being a local perturbation under the null.
To conclude this section, we note that it is possible to develop a testing procedure adaptive to potential first order degeneracy, that is, in settings where is not always first order degenerate under the null. We emphasize that fails to be a valid statistic since it diverges to infinity at those nondegenerate points, and so does
[TABLE]
because might not be identified given . By introducing an appropriate selection rule, we can combine first and second order asymptotics to provide a more general testing procedure; see Remark A.1. Development of adaptiveness not only serves to maintain generality of our theory, but also is necessary when constructing confidence sets for ; see Remark A.2.
Remark A.1**.**
If is only degenerate at some but not all points under the null, then one may employ the statistic
[TABLE]
where satisfying as . Heuristically, if is nondegenerate, then and thus with probability approaching one which has nondegenerate weak limit . If is degenerate, then and therefore with probability approaching one which has nondegenerate weak limit . Accordingly we may construct the corresponding critical value as
[TABLE]
where for and some estimator of ,
[TABLE]
The indicator functions above serve as a rule for selecting proper statistics based on degeneracy of (a finite sample analogue of) . ∎
Remark A.2**.**
Confidence regions for can be constructed by test inversion based on the statistic
[TABLE]
where is given by . Critical values can be constructed in a similar fashion as in Remark A.1. By the chain rule (Shapiro1990, Proposition 3.6), it is straightforward to see that and so if and only if . Moreover, when . In general, confidence regions thus constructed are less conservative than the plug-in type confidence regions with some level confidence region for . Pointwise validity of is straightforward to establish, but the local properties appear to be challenging to develop. ∎
Finally, we present the proofs of Lemma A.1 and Theorem A.1.
Proof of Lemma A.1: By Assumptions 2.2(i)(ii), A.1 and A.2, we have for ,
[TABLE]
Combination of Assumptions 2.1(i)(ii), , and result (A.7) allows us to invoke the second order Delta method to conclude that
[TABLE]
This completes the proof of the lemma.∎
Proof of Theorem A.1: Under the assumptions in Theorem 3.3 and Assumptions A.3(i)(ii), we can show following the proof of Corollary 3.2 in Fang_Santos2014HDD that under . By Theorem 12.2.3 and Corollary 12.3.1 in TSH2005, and are mutually contiguous. It follows that
[TABLE]
Lemma A.1, Assumption A.3(i)(ii) and result (A.9) allow us to conclude by the portmanteau theorem that
[TABLE]
This establishes the first claim of the theorem.
For the second claim, note that if , then
[TABLE]
where we exploited for all and Assumption 2.1(iii). Hence,
[TABLE]
where the second inequality is due to the Lemma A.1, result (A.9) and the portmanteau theorem, the second equality is by being strictly increasing, the third inequality is by being subadditive, and the third equality is due to result (A.11), and being strictly increasing. This proves the second claim of the theorem.∎
Appendix B Proofs of Main Results
By Assumption 2.2(ii), the support of satisfies . Since only the differentiability of on is relevant, we may assume without loss of generality that in what follows. Moreover, By Proposition I.3.3 in Vakhania_Tarieladze_Chobanyan1987probability, the support of is closed. It then follows from Theorem 4.1 in Dugundji1951extension and Assumption 2.1(i), can be continuously extended from to . Throughout the appendix, we thus interpret as its continuous extension whenever it takes arguments with being the support of .
Proof of Theorem 2.1: The second claim follows from the first by the Slutsky theorem and the continuous mapping theorem, in view of Assumption 2.2(i)(ii) and continuity of on (interpreted as some continuous extension). Nonetheless, for pedagogical purposes, we go backwards and start by proving the second claim first. For each , let and define by
[TABLE]
By Assumption 2.1(ii), whenever . Moreover, (almost surely) is separable since it is tight by Assumption 2.2(ii). The second claim then follows by Theorem 1.11.1(i) in Vaart1996.
As for the first claim, define by
[TABLE]
Assumption 2.1(ii) then allows us to conclude again by Theorem 1.11.1(i) in Vaart1996 that
[TABLE]
By the continuous mapping theorem applied to result (B.1), we have
[TABLE]
The first claim then follows from result (B.2) and Lemma 1.10.2(iii) in Vaart1996. ∎
Proof of Theorem 3.1: Inspecting the structure of the problem, we see that the bootstrap consistency (26) is equivalent to for all by exactly the same arguments as the proof of Theorem A.1 in Fang_Santos2014HDD. Thus, it boils down to showing that for all if and only if for . One direction is immediate since if latter holds, then both and are degenerate at [math] for all , and hence are equal in distribution. The converse consists of two steps.
To begin with, note that by Assumption 2.2(ii), being centered Gaussian and Lemma A.7 in Fang_Santos2014HDD, we may assume without loss of generality that the support of is and that is separable. Since is separable, it follows that the Borel -algebra, the -algebra generated by the weak topology, and the cylindrical -algebra coincide by Theorem 2.1 in Vakhania_Tarieladze_Chobanyan1987probability. Furthermore, by Theorem 7.1.7 in Bogachev2007, is Radon with respect to the Borel -algebra, and hence also with respect to the cylindrical -algebra. Finally, let be the law of on .
Step 1: Show that corresponds to a bilinear map if for all .
For completeness, we introduce additional notation following Section 3.7 in Davydov1998local. First, let denote the dual space of , and for any and . Similarly denote the dual space of by and the corresponding bilinear form by . Since is Gaussian, (Bogavcev1998gaussian, p.42). We may thus embed into . Denote by the closure of , viewed as a subset of . By some abuse of notation write for any and . Finally, for each we let denote the law of , write whenever is absolutely continuous with respect to , and define the set:
[TABLE]
Since is Radon with respect to the cylindrical -algebra of , it follows by Theorem 7.1 in Davydov1998local that there exists a continuous linear map satisfying for every :
[TABLE]
Fix an arbitrary and . Since for all , it follows that and must be equal in distribution for all .191919The proof of Lemma A.3 in Fang_Santos2014HDD never exploits that is a first order derivative beyond continuity of and which are satisfied by . In particular, their characteristic functions must equal each other, and hence for all and :
[TABLE]
where in the second equality we have exploited being positively homogenous of degree two. Setting , we have by (B.4) that
[TABLE]
for all and .
We next aim to equate second order right derivatives of both sides in the identity (B.5). The second order right derivative of the left hand side at is given by
[TABLE]
On the other hand, exploiting result (B.3), linearity of and that implies for all and in particular for all , we may rewrite the right hand side of (B.5) as
[TABLE]
The integrand on the right hand side of (B) is differentiable with respect to for all and the resulting derivative is dominated by which is integrable against since by Proposition 2.10.3 in Bogavcev1998gaussian and . Thus by Theorem 2.27(ii) in Folland1999, the first order derivative of the right hand side in (B) at exists and is given by
[TABLE]
In turn, result (B.8) allows us to conclude that the second order right derivative of the right hand side in (B) at exists and is given by
[TABLE]
Since equation (B.5) holds for all and , it follows from results (B.6) and (B.9) that for all :
[TABLE]
Note that is the characteristic function of and hence it is continuous. Thus, since there exists a such that . For such it follows from (B.10) that
[TABLE]
Define a map by
[TABLE]
It then follows from (B.11) that, for any and any ,
[TABLE]
where . Since is linear, is bilinear on . Moreover, is continuous on due to continuity of (and hence ) and . We thus conclude from being a dense subspace of by Proposition 7.4(ii) in Davydov1998local that is continuous and bilinear on . Since is arbitrary, it follows from Lemma A.2 in Vaart1991differentibility that is bilinear and continuous. By identity (B.12), we have for all . Hence, is a quadratic form corresponding to the bilinear map .
Step 2: Conclude that on the support of . Note that if is second order Hadamard differentiable, then one can directly start with Step 2.
By Lemma A.3 in Fang_Santos2014HDD, for all ,
[TABLE]
where the third equality exploited bilinearity of . Fix an arbitrary . By result (B), we have for all and ,
[TABLE]
where the last step used linearity of in its second argument. We now equate second derivatives of both sides at . The second derivative of the left hand side is trivially zero, while that of the right hand side, by the recursive use of dominated convergence arguments, is given by . Thus we have
[TABLE]
for all , which in turn implies that for all ,
[TABLE]
Picking a sequence , replacing with in (B.16) and letting leads to, by the dominated convergence theorem: for all and all ,
[TABLE]
Consequently, for all and -almost surely . Since is arbitrary, we conclude by Lemma 6.10 in AliprantisandBorder2006 that for all and -almost . Hence, for -almost .
Finally, denote by the collection of all such that . Then we have by Assumption 2.2(ii) and the above discussion. We claim that is dense in . To see this, suppose otherwise and then there must exist some and some such that . Note that i) since , and ii) for all by the definition of . These contradict the fact . Since is continuous , we may conclude from being dense in and on that on . ∎
Proof of Theorem 3.2: Let and define for each the map by
[TABLE]
If satisfies as , then Assumption 3.3 allows us to conclude that
[TABLE]
Since admits a continuous extension on , by the corresponding extension of according to equation B.12, it follows from (B) that
[TABLE]
Next, let , and . By Assumption 2.1, 2.2, 3.1 and 3.2(i), it follows from Lemma A.2 in Fang_Santos2014HDD that for independently distributed according to ,
[TABLE]
By the continuous mapping theorem and result (B.20) we have
[TABLE]
Combining the separability of and by Assumption 2.2(ii), results (B.19) and (B.21), we conclude by Theorem 1.11.1(i) in Vaart1996 that
[TABLE]
By Lemma 1.10.2 in Vaart1996 we have from (B.22) that
[TABLE]
Now fix . Note that
[TABLE]
By Lemma 1.2.6 in Vaart1996,
[TABLE]
Results (B.23), (B) and (B.25), together with being arbitrary, then yield
[TABLE]
Result (B.21) and Assumption 2.2(ii) implies that is asymptotically measurable and asymptotically tight. In turn, Lemmas 1.4.3 and 1.4.4 in Vaart1996 implies that is asymptotically tight and asymptotically measurable. Fix an arbitrary subsequence . Then Theorem 1.3.9 in Vaart1996 implies that converges weakly along a further subsequence of to a tight Borel law in , which is equal to by marginal convergence. This is a weak limit where the dependence structure between the first two components and last two components is known and in fact unique. Since is arbitrary, it follows that
[TABLE]
Since and hence is continuous, it follows from result (B.27) and the continuous mapping theorem that
[TABLE]
Combination of the continuous mapping theorem and Lemma 1.10.2(iii) in Vaart1996 yields that
[TABLE]
By the triangle inequality, we have
[TABLE]
By Lemma 1.2.6 in Vaart1996 and result (B.29)
[TABLE]
Combination of (B.26), (B), (B) and the triangle inequality leads to
[TABLE]
The theorem follows by combining (B.26) and (B.32) and noticing that
[TABLE]
where the second equality is due to bilinearity of .∎
Proof of Theorem 3.3: Inspecting the proof of Theorem 3.2 in Fang_Santos2014HDD, we see that being a first order derivative is actually never exploited there. The conclusion of the theorem then follows in view of Lemma B.2 when combined with exactly the same arguments in Fang_Santos2014HDD. ∎
Proof of Proposition 3.1: Let and such that . Since by Assumption 2.1(iii), we may rewrite :
[TABLE]
where . By Assumptions 2.2(i), 3.5, Lemma 1.10.2 in Vaart1996 and , we have . By Assumptions 2.1(ii), 2.2(ii) and Theorem 1.11.1(ii) in Vaart1996, we thus have
[TABLE]
By Assumption 2.1 and 2.2, it follows from Theorem 2.1 and that
[TABLE]
Combining results (B), (B.34) and (B.35) we thus arrive at the desired conclusion. ∎
Lemma B.1**.**
Suppose that Assumptions 2.2(i)(ii) and 3.1(ii) hold, and that is Hadamard differentiable at tangentially to with satisfying Assumption 2.1(iii). Then , where for ,
[TABLE]
Proof: This lemma is somewhat similar to Lemma 5 in AndrewsandGuggen2010ET and we include the proof here only for completeness. Fix and let . Note that for all . Since is Hadamard differentiable at tangentially to , it follows by Theorem 3.9.15 in Vaart1996 that
[TABLE]
This, together with Lemma 10.11 in Kosorok2008, give us: for all ,
[TABLE]
Fix . Clearly, for all and all . Hence, by (B.37),
[TABLE]
By definition of , it follows from (B.38) that
[TABLE]
Since is arbitrary, the conclusion of the lemma then follows from result (B.39).∎
Lemma B.2**.**
Let Assumptions 2.1 hold, and be an estimator depending on . Then the following are equivalent:
- (i)
For every compact set and every ,
[TABLE]
- (ii)
For every compact set , every and every ,
[TABLE]
- (iii)
For every sequence and every such that as ,
[TABLE]
Proof: The equivalence between (i) and (ii) is intuitive and straightforward to establish. Suppose that (i) holds. Fix a compact set , a sequence with , and . We want to show that there exists some such that for all ,
[TABLE]
But from (i) we know that there is some such that
[TABLE]
which in turn implies that there is some satisfying for all
[TABLE]
Since , there exists some such that for all and hence
[TABLE]
Setting , we see that (B.43) follows from (B.45) and (B.46).
Conversely, suppose that (ii) holds, fix a compact set and , and we aim to establish (i) or equivalently, there exists some such that (B.45) holds. Pick a sequence . Then there exists some such that (B.43) holds with “” replaced by “”. Setting , we may then conclude (B.45) from (B.43).
Now suppose (ii) (and hence (i)) holds again and let such that . Fix . There must be some such that for all . By the triangle inequality we have: for all ,
[TABLE]
Part (iii) then follows from (B) and part (i).
Finally, suppose that (iii) holds. Fix a compact set and . Let . Note that if , then there must exist some such that and this is true for all . It follows that
[TABLE]
Note that is possibly random and satisfies as . Fix an arbitrary subsequence . Since is compact, it follows by Lemma A.6 in Fang2014Plugin that there exists a further subsequence and some deterministic such that as . By the triangle inequality,
[TABLE]
Since as , the first term on the right hand side above tends to zero along by (iii) and Lemma B.3, while the second term tends to zero along by Theorem 1.9.5 in Vaart1996. Since is arbitrary, combination of results (B.48) and (B) then leads to (ii). ∎
Lemma B.3** (Extended Continuous Mapping Theorem).**
Let and be metric spaces equipped with metrics and respectively, a possibly random map for each , and a nonrandom map. Suppose that whenever for and . If such that is Borel measurable, separable and satisfies , then .
Proof: We closely follow the proof of Proposition A.8.6 in BKRW993Efficient (see also Vaart_Wellner1990prohorov). Fix throughout. First, we show that is continuous. By assumption, for each we have
[TABLE]
where for . This can be easily seen by the triangle inequality:
[TABLE]
Notice that again by assumption, the triangle inequality and result (B.50) we have
[TABLE]
as followed by . Since is a nonrandom function, we must have as and hence is continuous on .
Next, for define
[TABLE]
This is well defined by a simple reductio ad absurdum argument as in BKRW993Efficient. We now show that is measurable. This is done by proving that is lower semicontinuous, i.e., for implies
[TABLE]
Fix and such that as . Then there must exist some subsequence of such that . Since is integer valued, we further have for all sufficiently large. If , then the inequality (B.52) follows trivially. Otherwise, suppose that . For any with , there exists an such that for all . By definition of , it follows that for all ,
[TABLE]
Letting , we have by and continuity of and that for all ,
[TABLE]
Hence, and hence is Borel measurable.
Since , we may assume without loss of generality that takes values in . In turn, it follows that is a Borel -valued random variable. Thus there exists some such that
[TABLE]
Since , there exists some such that for all ,
[TABLE]
Now define
[TABLE]
It follows that for all ,
[TABLE]
by definition of , results (B.55) and (B.56), and we are done since is arbitrary.∎
Appendix C Results for Examples 2.1 - 2.6
Example 2.2: Moment Inequalities
In this example, it is a simple exercise to show that
[TABLE]
Thus, is Hadamard differentiable with the derivative degenerate at . Moreover, is second order Hadamard directionally differentiable. The derivative is nondegenerate at 0, though degenerate whenever . Exploiting the structure in (C.1), we may easily estimate the derivative by
[TABLE]
where satisfies , and . Interestingly, construction of as above amounts to the generalized moment selection procedure as in AndrewsandSoares2010 for conducting inference in moment inequalities models.
Example 2.3: Cramer-von Mises Functionals
Cramer-von Mises functionals can be viewed as generalized Wald functionals. It is straightforward to show that is first and second Hadamard differentiable at any with derivatives satisfying:
[TABLE]
for all . Note that first order derivative is degenerate when , while second order derivative is nowhere degenerate. The corresponding bilinear map is given by . In this example, there is no need for derivative estimation because is a known map.
Example 2.4: Stochastic Dominance
Lemma C.1**.**
Let satisfy and be given by for any . Then it follows that
- (i)
* is first order Hadamard differentiable at any with satisfying for any *
[TABLE]
where . 2. (ii)
* is second order Hadamard directionally differentiable at any and the derivative is given by: for any *
[TABLE]
where .
Proof: Fix . Further, let , be a sequence in satisfying for some , and
[TABLE]
Observe that since for all , and due to , the dominated convergence theorem yields that:
[TABLE]
and
[TABLE]
Combining results (C.3) - (C) yields
[TABLE]
which establishes the first claim of the lemma.
Next fix and let and be as before. Therefore, by the dominated convergence theorem we have
[TABLE]
and
[TABLE]
It follows from results (C.6)-(C) that
[TABLE]
This competes the proof of the second claim and we are done.∎
Note that if has Lebesgue measure zero, i.e., almost everywhere, then and simplifies to . If in addition the contact set has Lebesgue measure zero, then in turn is degenerate, corresponding to the degenerate limits obtained in Theorem 1 of Linton2010. Let be an estimator of . Then we may estimate by
[TABLE]
It is a simple exercise to verify that Assumption 3.4 is satisfied provided
[TABLE]
where denotes the set difference between sets and . Such a construction corresponds to the bootstrap procedure studied in Linton2010.
Example 2.5: Conditional Moment Inequalities
Lemma C.2**.**
Let be compact under some metric and be given by . Then it follows that:
- (i)
* is Hadamard differentiable at any satisfying and , and its derivative for any *
- (ii)
* is second order Hadamard directionally differentiable at any satisfying and tangentially to , and the derivative is given by: for any ,*
[TABLE]
where , and .
Remark C.1**.**
Note that if , then simplifies to .∎
Proof: Let satisfying and , such that , and . Combining , so that and the triangle inequality, we have
[TABLE]
as desired in part (i), where in the last step we used the fact that .
As for the second claim, let satisfying and , such that , and . By and , Lipschtiz continuity of the sup operator and the triangle inequality we have
[TABLE]
Since and , it follows that
[TABLE]
and that
[TABLE]
Combination of results (C), (C) and (C.14) leads to
[TABLE]
Next, fix . By definition of , compactness of and continuity of , we see that . Since also and , it follows that for all and for all large. In turn we have
[TABLE]
where the last step is due to . On the other hand, we have,
[TABLE]
where the first inequality is due to for all and , the second inequality exploits the definition and compactness of , and the equality is due to uniform continuity of on since and is compact.
Finally, combining results (C), (C), and we have:
[TABLE]
It follows from , (C.15) and (C) that
[TABLE]
as desired for the second claim of the lemma. ∎
Suppose that and are respectively estimators of and that satisfy202020We note that for two generic sets and in a metric space, neither controls nor controls (Lemenant_Milakis_Spinolo2014).
[TABLE]
Based on and and in view of Lemma B.3 in Fang_Santos2014HDD, we may estimate the derivative as follows:
[TABLE]
The estimation of and is in accordance with the generalized moment selection in Andrews_Shi2013CMI; see also Kaido_Santos2013.
Example 2.6: Overidentification Test
Lemma C.3**.**
Let be a compact set, and be given by where and is a symmetric positive definite matrix. Then we have
- (i)
* is Hadamard differentiable at any satisfying for some with the derivative given by for all .*
- (ii)
If is in the interior of , satisfies , and for all small , for some and some , then is second order Hadamard directionally differentiable at tangentially to with the derivative given by: for any
[TABLE]
where is the Jacobian matrix defined by J(\gamma_{0})\equiv\frac{d\theta(\gamma)}{d\gamma^{\intercal}}\big{|}_{\gamma=\gamma_{0}}.
Proof: Fix and let and such that . For a vector , define the norm . It follows that
[TABLE]
where the second inequality is because for all and the last step is due to by assumption. This establishes part (i).
For part (ii), fix with and let and such that . First of all, note that for ,
[TABLE]
where the first inequality is by Lipschitz continuity of the operator and the triangle inequality, and the last inequality follows from and for .
Next, for each fixed with and the smallest eigenvalue of , by assumption and the triangle inequality we have: for all sufficiently large so that ,
[TABLE]
where the strict inequality is due to . This in turn implies that for all large,
[TABLE]
Now for , set and . Note that . Since and are continuous, it then follows that
[TABLE]
In turn, notice that
[TABLE]
where the first inequality follows from the formula and that is any fixed element in , and the last step follows from uniform continuity of on because is continuous on and is compact.
Since , we further have,
[TABLE]
By the mean value theorem applied entry-wise to , there exist all between and such that
[TABLE]
where by abuse of notation we write
[TABLE]
Since and is compact, is uniformly continuous on and hence
[TABLE]
Since all norms in finite dimensional spaces are equivalent, it follows from results (C), (C), (C.29), (C) and for all that
[TABLE]
By assumption, is in the interior of and so for all sufficiently large. It follows that
[TABLE]
where the second equality exploits the fact that is symmetric. For each , by the projection theorem there is some such that
[TABLE]
Thus, by choosing large if necessary so that , we have from results (C.31), (C) and (C.33) that
[TABLE]
Combining (C.34), and part (i), we then arrive at part (ii).∎
Remark C.2**.**
The condition that “for all small , for some and some ” in Lemma C.3 effectively imposes restrictions on the Jacobian matrix that prevent one directly applying Lemma C.3 to the setup of Dovonon_Renault2013testing where is a singleton. To see this, let be the moment function in Dovonon_Renault2013testing. Then, for any with for , we have by Dovonon_Renault2013testing,
[TABLE]
for some constant depending on the eigenvalues of the Hessian matrices (evaluated at ) of the maps , where for the second equality we exploited the facts that (i) , (ii) the Jacobian matrix is degenerate, and (iii) . But by assumption, for the same ,
[TABLE]
for all sufficiently small since , a contradiction. The conclusion holds more generally: the condition in fact excludes Jacobian matrices of deficient rank, regardless of whether is point or partially identified. To see this, let for some nonzero . Then we may choose for some suitable and for all small – this is possible since is required to be in the interior of . Then the previous arguments apply with such a choice of and any . ∎
Appendix D Proofs for Section 4
Lemma D.1**.**
Let be given by . Then
- (i)
* is Hadamard differentiable at any satisfying for some and the derivative satisfies for all .*
- (ii)
* is second order Hadamard directionally differentiable at any under Assumption 4.1 tangentially to with the derivative given by: for all ,*
[TABLE]
*where is the (nonempty) identified set of , and with the *th row given by and
[TABLE]
Proof: Fix satisfying for some , such that , and . It follows that
[TABLE]
where in the last step we used the fact that . So for any , as desired for the first claim of the lemma.
Now consider and suppose that Assumption 4.1 holds. Pick such that , and . Note that under Assumption 4.1. Then first, we have
[TABLE]
Next, let and . By Equation (7) in Dovonon_Renault2013testing, CovDiag), where for a matrix , denotes the vector consisting of diagonal entries. Also, let and denote the smallest and the smallest positive singular values, respectively. We then have for ,
[TABLE]
where the first inequality follows from a simple application of the singular value decomposition of , the second inequality exploits the generalized mean inequality, and last inequality is by Lemma D.4. Note that by Assumption 4.1(v). Let for the nontrivial case . Then it follows by the triangle inequality that for sufficiently large such that ,
[TABLE]
and therefore
[TABLE]
For , let and and and . Then we have
[TABLE]
where the first equality is due to the definition of and the second follows by
[TABLE]
where in the first inequality is any fixed element in , the last equality follows by the uniform continuity of over . Noting that (\gamma^{\intercal}Y_{t+1})^{2}=\gamma^{\text{\scalebox{0.7}{\intercal}}}Y_{t+1}Y_{t+1}^{\intercal}\gamma and so c(\gamma)=\gamma^{\text{\scalebox{0.7}{\intercal}}}E[Y_{t+1}Y_{t+1}^{\intercal}]\gamma, we may write
[TABLE]
where we made use of some facts on the vec operator (AbadirandMagnus, p.282). In turn, by (D.4) and the definition of , we have
[TABLE]
where the second equality follows by the fact that converges to uniformly in with respect to the Hausdorff metric by Lemma D.5 and Lemma B.3 in Fang_Santos2014HDD, and the third equality by the facts that for all and all (to be proved shortly) and that the inside minimum can be attained in for all large enough. Combining (D.2), (D.3) and (D.5) yields
[TABLE]
as desired. It remains to show for all and all . Fix and . By similar arguments (in reverse order) that led to (D.4), we obtain
[TABLE]
Next, note that, by the law of iterated expectations, we have
[TABLE]
where the third inequality follows by the model specified in display (40) and Assumption 4.1(ii). Result (D) in turn implies that, for all ,
[TABLE]
where because which is equal to the intersection of and the null space of – see our discussions below Assumption 4.1. The claim now follows by combining (D.6) and (D.8). ∎
Remark D.1**.**
The derivative (45) can be rewritten as:
[TABLE]
where denotes the orthogonal complement of . Then for and , we may estimate by
[TABLE]
Lemma D.2**.**
Under Assumptions 4.1 and 4.2, we have
[TABLE]
where is a zero mean Gaussian process with the covariance functional satisfying: for any , and ,
[TABLE]
Proof: By elementary rearrangements we have
[TABLE]
where , , and
[TABLE]
By Assumptions 4.1(vi) and 4.2, and the law of large numbers for stationary and ergodic sequences and the compactness of , we have
[TABLE]
Once again by Assumptions 4.1(vi) and 4.2, together with where having its th row given by for
[TABLE]
we have by the compactness of that
[TABLE]
for some Gaussian process . In particular, for the summand in is a martingale difference sequence, so for any , , the covariance functional satisfies
[TABLE]
This completes the proof of the lemma. ∎
Lemma D.3**.**
Suppose Assumptions 4.1, 4.2 and 4.3 hold. Let be constructed as in (50). Then we have: whenever as for a sequence and , it follows that
[TABLE]
Proof: Pick a sequence and such that as . Define
[TABLE]
Then we have
[TABLE]
where “” follows from , and the last step is by Assumptions 4.2 and 4.3.
Next, under Assumptions 4.1, 4.2 and 4.3, we have by Theorem 3.1 in CHT2007 that as , with , , and . Let
[TABLE]
Since and is compact, together with , it follows that
[TABLE]
Since is monotonically decreasing as , we further have
[TABLE]
The lemma then follows from results (D), (D) and (D.12). ∎
Proof of Proposition 4.2: By Lemmas D.2 and D.3, Assumptions 3.1 and 3.2, and the cdf of the weak limit being strictly increasing at , we have following exactly the same proof of Corollary 3.2 in Fang_Santos2014HDD.212121Note trivially admits a continuous extension on with the first min replaced by . Then under , the conclusion follows from combining Proposition 4.1, Slutsky thoerem, being a continuity point of the weak limit and the portmanteau theorem. ∎
Lemma D.4**.**
Let and be given as in the proof of Lemma D.1. Then under Assumption 4.1 and , for all sufficiently small , we have
[TABLE]
where denotes the smallest positive singular value of .
Proof: To begin with, note that i) by Assumption 4.1, ii) under the null, iii) is well-defined by Assumption 4.1(i) so that . Let be the singular value decomposition of , where and are orthonormal, and is a diagonal matrix with diagonal entries in descending order. Since is of full column rank, is equal to the th diagonal entry of with .
Fix . Let and write for and . Suppose first that . Then we have
[TABLE]
since by direct calculations. In turn, result (D.13) implies
[TABLE]
Moreover, we know from being orthonormal and that
[TABLE]
Combining results (D.13) and (D.14) we may thus conclude that
[TABLE]
implying that . This also holds for all sufficiently small when in which case in view of (D.15). Consequently, we have
[TABLE]
for all sufficiently small . This completes the proof of the lemma. ∎
Lemma D.5**.**
Let and be defined as in the proof of Lemma D.1. Then uniformly in as .
Proof: First, note that . For , set . It is a simple exercise to verify that . It follows that
[TABLE]
In turn, result (D.18) implies that: for all ,
[TABLE]
On the other hand, for , set for if , and for and if . In any case, by direct calculations. Therefore,
[TABLE]
uniformly in , where we exploited the facts that uniformly in and that is bounded. The lemma then follows from (D.19) and (D.20). ∎
Our final lemma shows the work in Section 4 is consistent with Dovonon_Renault2013testing in the case they studied when the weighting matrix is the identity matrix. We note that the essential difference between and in (53) and (54) is: the former consists of the second order derivatives of the moment function with respect to all entries of , whereas the latter the second order derivatives of the moment function with the -th entry of substituted by .
Lemma D.6**.**
The limit with in Theorem 3.1 of Dovonon_Renault2013testing can be represented as: for and defined in Section 4,
[TABLE]
Proof: First, note that by Dovonon_Renault2013testing, with can be represented as in (54) where is centered Gaussian with variance . Next, simple algebra shows that
[TABLE]
where , where is the vector of ones. It follows that
[TABLE]
as desired, where the second equality exploited the facts that and that for any , and the third equality follows from the fact that the columns in and form a basis for . To see this last fact, note first that the columns of are clearly linearly independent; moreover, if for some nonzero , then by simple algebra, contradicting the linear normalization that . ∎
References
