Detecting Identification Failure in Moment Condition Models
Jean-Jacques Forneron

TL;DR
This paper introduces a new method to detect identification failure in moment condition models using a quasi-Jacobian matrix, enabling robust inference regardless of the identification strength.
Contribution
It proposes a novel quasi-Jacobian matrix approach and a simple chi-squared test for detection of identification failure in moment models.
Findings
The quasi-Jacobian is asymptotically singular when identification fails.
The test works for strong, semi-strong, and weak identification.
Monte Carlo simulations and empirical application validate the method.
Abstract
This paper develops an approach to detect identification failure in moment condition models. This is achieved by introducing a quasi-Jacobian matrix computed as the slope of a linear approximation of the moments on an estimate of the identified set. It is asymptotically singular when local and/or global identification fails, and equivalent to the usual Jacobian matrix which has full rank when the model is point and locally identified. Building on this property, a simple test with chi-squared critical values is introduced to conduct subvector inferences allowing for strong, semi-strong, and weak identification without \textit{a priori} knowledge about the underlying identification structure. Monte-Carlo simulations and an empirical application to the Long-Run Risks model illustrate the results.
| Rank Failure | Near Rank Failure | Full Rank |
|---|---|---|
| Rank Failure | Near Rank Failure | Full Rank | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | AR1 | AR2 | AR3 | AR1 | AR2 | AR3 | AR1 | AR2 | AR3 | |||||||
| 100 | 0.01 | 0.01 | 0.03 | 0.02 | 1.00 | 0.01 | 0.01 | 0.03 | 0.02 | 1.00 | 0.04 | 0.02 | 0.05 | 0.07 | 0.14 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.01 | 0.04 | 0.06 | 0.00 | ||
| 250 | 0.02 | 0.02 | 0.05 | 0.09 | 1.00 | 0.02 | 0.02 | 0.03 | 0.07 | 1.00 | 0.04 | 0.02 | 0.04 | 0.05 | 0.01 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.02 | 0.04 | 0.06 | 0.00 | ||
| 500 | 0.02 | 0.02 | 0.04 | 0.17 | 1.00 | 0.01 | 0.01 | 0.04 | 0.08 | 1.00 | 0.05 | 0.02 | 0.05 | 0.05 | 0.00 | |
| 0.04 | 0.02 | 0.04 | 0.00 | 0.00 | 0.04 | 0.02 | 0.04 | 0.00 | 0.00 | 0.06 | 0.02 | 0.06 | 0.05 | 0.00 | ||
| 1000 | 0.02 | 0.02 | 0.05 | 0.22 | 1.00 | 0.02 | 0.02 | 0.04 | 0.05 | 0.98 | 0.05 | 0.02 | 0.05 | 0.05 | 0.00 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.02 | 0.04 | 0.00 | 0.00 | 0.05 | 0.02 | 0.05 | 0.05 | 0.00 | ||
| 255 | 22 | 1.68 | 0.30 | 0.04 | ||||||||
| 208 | 0.94 | 0.42 | 0.06 | 0.01 | ||||||||
| 208 | 0.95 | 0.06 | 0.04 | 0.01 | 0.00 | 0.00 |
| Rank Failure | Near Rank Failure | Full Rank | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | AR1 | AR2 | AR3 | AR1 | AR2 | AR3 | AR1 | AR2 | AR3 | |||||||
| 100 | 0.01 | 0.01 | 0.03 | 0.02 | 1.00 | 0.01 | 0.01 | 0.03 | 0.02 | 1.00 | 0.02 | 0.02 | 0.05 | 0.07 | 0.97 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.01 | 0.04 | 0.06 | 0.00 | ||
| 250 | 0.02 | 0.02 | 0.05 | 0.09 | 1.00 | 0.02 | 0.02 | 0.03 | 0.07 | 1.00 | 0.03 | 0.02 | 0.04 | 0.05 | 0.35 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.02 | 0.04 | 0.06 | 0.00 | ||
| 500 | 0.02 | 0.02 | 0.04 | 0.17 | 1.00 | 0.01 | 0.01 | 0.04 | 0.08 | 1.00 | 0.05 | 0.02 | 0.05 | 0.05 | 0.00 | |
| 0.04 | 0.02 | 0.04 | 0.00 | 0.00 | 0.04 | 0.02 | 0.04 | 0.00 | 0.00 | 0.06 | 0.02 | 0.06 | 0.05 | 0.00 | ||
| 1000 | 0.02 | 0.02 | 0.05 | 0.22 | 1.00 | 0.02 | 0.02 | 0.04 | 0.05 | 1.00 | 0.05 | 0.02 | 0.05 | 0.05 | 0.00 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.02 | 0.04 | 0.00 | 0.00 | 0.05 | 0.02 | 0.05 | 0.05 | 0.00 | ||
| Rank Failure | Near Rank Failure | Full Rank | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | AR1 | AR2 | AR3 | AR1 | AR2 | AR3 | AR1 | AR2 | AR3 | |||||||
| 100 | 0.01 | 0.01 | 0.04 | 0.01 | 1.00 | 0.01 | 0.01 | 0.04 | 0.01 | 1.00 | 0.04 | 0.01 | 0.05 | 0.05 | 0.38 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.05 | 0.01 | 0.05 | 0.00 | 0.00 | 0.05 | 0.01 | 0.05 | 0.06 | 0.00 | ||
| 250 | 0.02 | 0.02 | 0.04 | 0.03 | 1.00 | 0.01 | 0.01 | 0.03 | 0.05 | 1.00 | 0.05 | 0.02 | 0.05 | 0.04 | 0.04 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.04 | 0.01 | 0.04 | 0.00 | 0.00 | 0.05 | 0.01 | 0.05 | 0.05 | 0.00 | ||
| 500 | 0.02 | 0.02 | 0.05 | 0.08 | 1.00 | 0.01 | 0.01 | 0.04 | 0.11 | 1.00 | 0.05 | 0.01 | 0.05 | 0.04 | 0.00 | |
| 0.05 | 0.02 | 0.05 | 0.00 | 0.00 | 0.05 | 0.01 | 0.05 | 0.00 | 0.00 | 0.06 | 0.02 | 0.06 | 0.05 | 0.00 | ||
| 1000 | 0.01 | 0.01 | 0.05 | 0.09 | 1.00 | 0.01 | 0.01 | 0.04 | 0.14 | 1.00 | 0.06 | 0.01 | 0.06 | 0.05 | 0.00 | |
| 0.05 | 0.01 | 0.05 | 0.00 | 0.00 | 0.04 | 0.01 | 0.04 | 0.00 | 0.00 | 0.05 | 0.01 | 0.05 | 0.05 | 0.00 | ||
| 145 | 20 | 0.61 | 0.20 | 0.02 | ||||||||
| 169 | 0.46 | 0.33 | 0.12 | 0.01 | ||||||||
| 169 | 0.43 | 0.32 | 0.02 | 0.00 | 0.00 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\usdate
Detecting Identification Failure in
Moment Condition Models
Jean-Jacques Forneron Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215 USA.
Email: [email protected], Website: http://jjforneron.com.
I would like to thank Serena Ng for discussions that initiated this project. I thank Francesca Molinari for suggestions that greatly improved this paper. I also greatly benefited from comments and discussions with Tim Christensen, Pavel Cizek, Greg Cox, Ivàn Fernàndez-Val, Hiro Kaido, Nour Meddahi, Arthur Lewbel, Demian Pouzo, Zhongjun Qu, Eric Renault, Yichong Zhang and the participants of the BU-BC econometric workshop, the seminar participants at Brown, Chicago, CREST, NUS, NYU, UC Berkeley, Université de Montréal, University of Rochester, SMU, Toulouse School of Economics and conferences. I would also like to thank Joachim Grammig for kindly sharing his replication files for the long-run risks model.
Abstract
This paper develops an approach to detect identification failure in moment condition models. This is achieved by introducing a quasi-Jacobian matrix computed as the slope of a linear approximation of the moments on an estimate of the identified set. It is asymptotically singular when local and/or global identification fails, and equivalent to the usual Jacobian matrix which has full rank when the model is globally and locally identified. Building on this property, a simple test with chi-squared critical values is introduced to conduct subvector inferences allowing for strong, semi-strong, and weak identification without a priori knowledge about the underlying identification structure. Monte-Carlo simulations and an empirical application to the Long-Run Risks model illustrate the results.
JEL Classification: C11, C12, C13, C32, C36.
Keywords: Asset Pricing, Uniform Inference, Global Identification, Indirect Inference.
1 Introduction
The Generalized Method of Moments (GMM) of Hansen and Singleton (1982) is a powerful estimation framework which does not require the model to be fully specified parametrically. Under regularity conditions, the estimates are consistent and asymptotically Gaussian. In particular, the moments should uniquely identify the finite-dimensional parameters. This is very difficult to verify in practice and, as noted in Newey and McFadden (1994), is often assumed. Yet, when identification fails or nearly fails, the Central Limit Theorem provides a poor finite sample approximation for the distribution of the estimates. This has motivated a vast amount of research on tests which are robust to identification failure. An empirically relevant problem, which remains less explored, is of determining, for a given set of estimating moments, whether local and global identification actually hold.
The contribution of this paper is two-fold: first, it introduces a quasi-Jacobian matrix which is singular under both local (first-order) and global identification failure and is informative about the coefficients involved in the identification failure. This is the main contribution of the paper as it provides an approach similar to Cragg and Donald (1993) and Stock and Yogo (2005) but in a non-linear setting. Second, the information is used to construct an identification robust subvector test which does not require a priori knowledge of the identification structure. The test is asymptotically non-conservative under strong identification. It is asymptotically efficient for strongly just-identified models.
The quasi-Jacobian matrix is the best linear approximation of the sample moment function over a region of the parameters where these moments are close to zero. To find the best linear approximation, a sup-norm (or -norm) loss is used to minimize the largest deviation from the linear approximation. This is known as a Chebyshev approximation problem which can be solved fairly quickly using convex optimization software. In the population, the quasi-Jacobian has full rank if, and only if, the parameters are both globally and locally identified. When either global or local identification fails, it is singular in all directions associated with the identification failure. (Non)-singularity of the quasi-Jacobian can be used to check whether identification holds numerically when it is not feasible analytically.
The asymptotic behaviour of the quasi-Jacobian matrix is studied under three identification regimes: including strong, semi-strong, and weak (or set) identification. Under strong identification, the moment conditions are informative, have a unique solution, under semi-strong identification are less informative but sufficiently so that for estimates to be consistent and asymptotically Gaussian. Antoine and Renault (2009), Andrews and Cheng (2012) showed that: under (semi)-strong identification, standard inference methods such as the t-test with standard normal critical values are asymptotically valid.111The term (semi)-strong will refer to cases where identification can be either strong or semi-strong. Antoine and Renault (2009) further distinguish between nearly-strong and nearly-weak identification. Under the latter, the limiting distribution may be non-Gaussian. Here, when this is the case, it will be referred to as higher-order local identification. Under weak and set identification, the moments are insufficiently informative compared to sampling uncertainty and multiple distant solutions to the moment conditions appear plausible, even in large samples, so that the parameters cannot be consistently estimated and standard inference methods are not asymptotically valid. The Supplement also considers higher-order local identification, where the solution is unique but not locally identified; it can be consistently estimated but with non-Gaussian limiting distribution. Under (semi)-strong identification, the quasi-Jacobian is shown to be asymptotically equivalent to the usual Jacobian: after re-scaling, it is asymptotically non-singular. Under higher-order and weak identification the quasi-Jacobian is asymptotically singular with eigenvalues vanishing in directions where identification fails. It is thus informative about the presence of identification failures and which directions are not identified.
Building on these results, this paper constructs a simple test procedure for subvector hypotheses on the parameters of the form:
[TABLE]
Subvector inference as described in (1) is quite prevalent in empirical work where only a few structural parameters are typically of interest. The remaining nuisance parameters describe other features of the data generating process needed for estimation. For instance, in the empirical application only preference parameters are of interest while the remaining coefficients parameterize the law of motion for consumption and dividends which is not of immediate interest. The paper relies on the Anderson and Rubin (1949, AR) test statistic for simplicity. The critical values take the form where is the number of moments and is determined using an Identification Category Selection (ICS) procedure based on the singular values of the quasi-Jacobian matrix. This is a projection inference procedure where the ICS step estimates the number of (semi)-strongly identified nuisance parameters to reduce the degrees of freedom.
Monte-Carlo simulations illustrate the results for a simple consumption-based asset pricing model. In the empirical application, the procedure is used to conduct joint inference on risk-aversion and the inverse elasticity of substitution in the long-run risks model of Bansal and Yaron (2004). The results suggest that several nuisance parameters are weakly identified but not all; some are (semi)-strongly identified. This implies that standard inferences based on t or Wald statistics are not asymptotically valid and full projection inference is valid, but conservative. Given the number of parameters in the application, the standard approach of performing test inversion using a grid search is very computationally demanding. Instead, an adaptive sampling procedure based on the Population Monte Carlo (PMC) principle draws uniformly on level sets of the objective function. This makes it possible to conduct robust inference on more complex models like the empirical application: the quasi-Jacobian and 5,000 uniform draws on the confidence set are computed in about 4 hours on a desktop computer.
Structure of the Paper
After a review of the literature and an overview of the notation, Section 2 introduces the setting, the procedure and provides more details about the quasi-Jacobian, the test, and the identification regimes. Section 3 derives the asymptotic behaviour of the quasi-Jacobian matrix, and Section 4 results for the test. Section 5 gives Monte-Carlo evidence for the results, and Section 6 the empirical application. Appendices A, B provide proofs for the main results. The Supplement includes sample R code to compute the quasi-Jacobian and for inference, a description of the PMC algorithm used to generate draws, and additional results for higher-order identification.
Related Literature
The literature on the identification of economic models is quite vast, and an extensive review is given in Lewbel (2018). Within this literature, this paper mainly relates to three topics: local and global identification of finite-dimensional parameters in the population, detection of identification failure in finite samples, and identification robust inference.
Koopmans and Reiersol (1950) provide one of the earliest general formulations of the identification problem at the population level. To paraphrase the authors, the main problem is to determine whether the distribution of the data, assumed to be generated from a given class of models, is consistent with a unique set of structural parameters. In the likelihood setting, Fisher (1967), Rothenberg (1971) introduced sufficient conditions for local and global identification. Komunjer (2012) provides weaker global identification conditions for GMM.
In linear models, global identification amounts to a rank condition on the slope of the moments. This insight was used in pre-testing linear IV models for identification failure using a first-stage F-statistic or rank tests, Cragg and Donald (1993), Stock and Yogo (2005), Kleibergen and Paap (2006). Pre-tests based on the null of strong identification appear in Hahn and Hausman (2002) for linear IV and Inoue and Rossi (2011), Bravo et al. (2012) for non-linear models. Pre-testing for strong identification could make size control difficult when the pre-test has low power. For non-linear models, Wright (2003) uses a rank test and Antoine and Renault (2020) a distorted J-statistic to detect local identification failure. Arellano et al. (2012) develop a test for underidentification of a single coefficient.
Given the impact of (near) identification failure on standard inferences, a large body of literature has developed identification robust tests. Much of the literature is concerned with inference on the full parameter vector, e.g. Anderson and Rubin (1949), Stock and Wright (2000), Kleibergen (2005), Andrews and Mikusheva (2016). Projection inference can be used to conduct subvector inference from these tests (Dufour, 1997). Alternatively, Bonferroni methods combined with a test can be used, Chaudhuri and Zivot (2011), Andrews (2017). For homoskedastic linear IV models, Guggenberger et al. (2012) propose critical values for a subset Anderson-Rubin test which improve power over full projection inference. In the same setting, Guggenberger et al. (2019) propose a data-driven choice of critical values based on a measure of identification strength of the nuisance parameters, and Kleibergen (2021) considers subvector conditional Likelihood-Ratio inference. This paper relies on the Anderson-Rubin statistic for inference, which is the simplest to implement. More powerful test statistics exist such as the conditional quasi-Likelihood Ratio. The main challenge there is in computing the critical values by simulation, which requires to repeatedly minimize non-linear and potentially multi-modal objective functions.222This is difficult for non-convex problems, see e.g. Nemirovsky and Yudin (1983, Section 1.6.2) for the complexity of the minimization problem and Nesterov (2018, p14-16) for the practical implications and software limitations.
Given knowledge about the source of a potential identification failure, and a specific structure in the underlying model Andrews and Cheng (2012, 2013, 2014), Cheng (2015), Han and McCloskey (2019), Cox (2020) propose identification robust tests which are asymptotically non-conservative and powerful under strong identification. These papers rely on a data-driven choice of critical value; it is determined by an ICS statistic built from model-specific knowledge about the source and form of the identification failure. This paper proposes and studies an ICS statistic which does not rely on model-specific information to determine identification status. The choice of robust critical values can coincide with Andrews and Cheng (2012)’s least-favorable critical value, see Appendix H.2 for an example. Andrews (2017) proposes an ICS based on the singular values of sample Jacobian which measures local but not global identification strength. His test applies to GMM and likelihood problems.
Under higher-order identification, estimates are consistent but the delta-method is not valid. The limiting distribution is non-standard (Rotnitzky et al., 2000), Dovonon and Hall (2018). This issue is known but much less studied than weak and set identifications. Dovonon et al. (2019) study identification robust tests under second-order identification, and Lee and Liao (2018) conduct standard inference under known second-order identification structure.
Notation
For any matrix (or vector) , is the Frobenius (Euclidian) norm of . For any square matrix , refers to the j-th eigenvalues of , in increasing order if is symmetric positive semi-definite; and refer to its largest and smallest eigenvalue, respectively, are the first d eigenvalues of in increasing order. For a weighting matrix , the norm is computed as . The abbreviation wpa 1 will be used to abreviate “with probability approaching 1.” For , is a closed -ball around .
2 Setting and Assumptions
Following Hansen and Singleton (1982), the econometrician wants to estimate the solution vector to the system of unconditional moment equations:
[TABLE]
where , a compact subset of , . is the sample vector of moment conditions, is a sample of iid or stationary random variables. The parameter indexes the true distribution of the data , including the true . It has the form . indexes features of the data generating process beyond that are relevant to identification and weak convergence. is a compact subset of a metric space with a metric between and that induces weak convergence for for any .333For reduce the number of coefficients involved in the notation below, this distance will be written as . See Andrews and Cheng (2012, p2162) for a discussion of these conditions. The operator denotes the expectation under . is then the population vector of moment conditions evaluated at the true and a coefficient . Throughout, it is assumed that is such that . The function is assumed to be continuously differentiable on for all .
Given the sample moments and a sequence of positive definite weighting matrices converging to , the GMM estimator solves the sample minimization problem:
[TABLE]
where is the optimization space.
Assumption 1** (Parameter Space, Sample Moments, Weighting Matrix).**
i. and are compact; is a convex, compact subset of such that and for some ; for all and all , is non-singleton and connected, ii. for any sequence : and where is finite and non-singular, iii. . and are Lipschitz continuous in ; there exists such that , for all .
Assumption 1 i. implies that strictly contains so that issues arising when a parameter is on the boundary are not considered here.444See Cox (2020) for results on identification and boundary robust inference. The connected neighborhood condition plays the role of Assumption ACP iv. in Andrews and Cheng (2012, p2165). It implies that we can find sequences along a continuous path in leading to such that . Together with a continuity condition in Assumption 3 below, it allows to interpolate converging subsequences into converging sequences of parameters in one of the desired identification categories. This is similar to Assumption B2 in Andrews et al. (2020) and Assumption 14 in Cox (2020). Condition ii. is a uniform convergence condition, implied by a uniform CLT. Condition iii. ensures that is equivalent to so that the choice of does not alter the identifiability of the parameters.
2.1 Outline of the Procedure
The following steps provide a general overview of the computation of the quasi-Jacobian matrix, the ICS, and test procedure used in the paper. In the following, the matrix is an orthogonal projection matrix, projecting on the space orthogonal to . It can be written as so that it only selects elements associated with . The matrix is a weighted average of estimates of , with weights proportional to , described in more details below.555For iid data, is approximated using ; for dependent data a HAC estimator is used.
Computing the quasi-Jacobian and the test statistic:
Inputs bandwidth , kernel , cutoff , number of draws
quasi-Jacobian Matrix
i.
Draw uniformly on the level set
ii.
Compute the intercept and slope in the -norm regression:
(4)
where .
iii.
Compute the variance :
(5)
Identification Category Selection
i.
Compute the singular values of
ii.
Compute , the number of singular values greater than
Subvector Inference
i.
Compute the test statistic:
ii.
Reject at the confidence level if
In the procedure, is the quantile of a distribution with degrees of freedom, is the number of moment conditions. In the following, the number of draws is assumed to be sufficiently large for the finite- approximation error to be negligible. The regression (4) is known as a Chebyshev (or minimax) approximation problem and can be cast as a linear programming problem (Boyd and Vandenberghe, 2004, p293). It can be solved with a few lines of code using the cvx convex optimization toolkit.666See Supplemental Appendix G for sample R code which implements the method. (5) is also solved using cvx. Finally, note that, in the procedure, the intercept and the mean are nuisance parameters, only and are used in steps 3-4. On the computation side: Appendix F outlines a sequential Algorithm to sample on the level set (step 2i.), the quasi-Jacobian is only computed once; it is defined whether the sample moments are differentiable, or not. The standard Jacobian requires differentiability and needs to be evaluated at every grid point. Instead of the loss, one could use the -norm which yields least-squares solutions . Some technical difficulties arise because the identified set typically has measure zero, and stronger assumptions are required to derive the properties of compared to . The re-scaling in step 3 is discussed below. The following provides further details about the steps outlined above.
2.2 Linear Approximations and the quasi-Jacobian Matrix
The quasi-Jacobian matrix is defined as the slope of a local linear approximation for over an estimate of the identified set.
Definition 1**.**
(Sup-Norm Approximation) Let be a kernel function and a bandwidth. The sup-norm approximation solves:
[TABLE]
where . The quasi-Jacobian refers to the slope matrix .
In practice, the minimization problem (6) is solved over a finite grid as in (4). The grid can be generated using Monte-Carlo or quasi-Monte-Carlo methods (Robert and Casella, 2004; Lemieux, 2009). In the simulations, the Sobol sequence was used. In the empirical application, is relatively large, and the set of where is fairly narrow; the acceptance rate is very low. A very large number of draws would be needed to find sufficiently many with non-zero weight, i.e. . The empirical application relies on a sequential sampling principle called Population Monte Carlo (Cappé et al., 2004). It constructs a sequence of proposal distributions that approximate the target distribution with increasing accuracy, see Appendix F for details. These proposals can be re-purposed to compute confidence sets, reducing the additional time required for test inversion. It can also be used to compute for different values of as a sensitivity analysis.
Assumption 2** (Kernel, Bandwidth).**
i. if , if . is continuous on , ii. , .
The kernel is assumed to have compact support. The uniform kernel, , was used in the simulations and empirical results.777The estimated is nearly numerically identical using the cosine or Epanechnikov kernels. The first condition ensures that selects the identified set with wpa 1 under weak identification. The second ensures that only captures the first-order Jacobian term in local expansions under (semi)-strong identification. Otherwise, it would also capture nonlinear terms from the remainder.
2.3 Test Procedure
To illustrate the usefulness of detecting identification failure, consider the following simple data-driven test procedure. It is based on the Anderson-Rubin statistic for non-linear GMM models as described in Stock and Wright (2000). To test null hypotheses of the form , compute the sample statistic:
[TABLE]
where consistently estimates the asymptotic variance . The test rejects at a nominal level if where is the quantile of a chi-square distribution with degrees of freedom. is computed using an identification category selection (ICS) procedure based on the quasi-Jacobian and its singular values. The procedure, described below, evaluates the number of nuisance parameters in which are potentially weakly/set identified. Using yields the largest critical value and amounts to full projection inference (Dufour and Taamouti, 2005). Using yields the smallest critical value which provides valid, non-conservative inferences when all of the nuisance parameters are strongly identified. Intermediate values of improve power compared to full projection while ensuring robustness if a subset of the nuisance parameters is weakly identified. A confidence set for collects all values of for which using the same .
The choice of should be invariant to rescaling the sample moments and/or the parameters . To this end, the procedure relies on two normalization matrices: and , where . an average of asymptotic variance estimators for . It is used to ensure the procedure is invariant to re-scaling and rotating the sample moments. is the -covariance matrix minimizing over and . These quantities are readily available from the steps required to compute . It is important to use an estimate of the variance of on rather than the variance of or of the sample Jacobian. When the model is set or weakly identified, the variance - which measures the size of the set - does not go to zero in directions where identification fails.888Lemma D4 shows that is bounded above under weak identification in directions where identification fails. The variance of or the Jacobian could be arbitrarily small, however.999Take , for all , a.s.. The variances of both and the Jacobian are zero; yet, . Hence, is vanishing in directions where identification fails and is invariant to rescaling the coefficients .
Let be the projection matrix on the orthogonal of the span of , compute the singular values of the normalized :
[TABLE]
where denotes the j-th eigenvalue in increasing order so that . By projection, the smallest singular values are equal to zero. Take , a decreasing sequence such that , and compute:
[TABLE]
where counts the number of singular values which are greater than the threshold .
Choice of Tuning Parameters:
A default choice is the uniform kernel . Then, the role of the pair is to estimate the solution set of parameter(s) such that . For this choice of kernel, if a law of the iterated logarithm applies, then, pointwise, almost surely using and .101010A law of the iterated logarithm implies almost surely, also for all , see e.g. Petrov (1995, Ch7); and Kosorok (2008, p31), van der Vaart and Wellner (1996, p379, footnote b) for references applying to empirical processes, which are not pointwise. In that sense, efficient weighting and are asymptotically optimal and makes invariant to linear transformations of the moments.
The role of the normalized and the threshold is analogous to the ICS procedure in Andrews and Cheng (2012, Section 5.2), and the subsequent literature. Here, it is shown that if nuisance parameters are weakly identified then at least singular values are . Hence, wpa 1, they are smaller than , if . As a result, is no greater than the number (semi)-strongly identified nuisance parameters wpa 1, which leads to valid inferences under weak identification. Typically, using larger values of in an ICS procedure is desirable for robust inference since it correctly detects identification failures with greater probability in finite samples. However, it also makes the test more conservative under semi-strong identification since it incorrectly detects identification failure with greater probability. This implies a trade-off between power for semi-strongly identified models with robustness for weakly identified models. The normalization in the procedure improves on this by making the behaviour of the ICS statistic more distinct between these two regimes. The normalization preserves the asymptotic singularity under weak identification but the normalized matrix diverges at a rate when identification is strong.
2.4 The quasi-Jacobian
The main component of the procedure is the quasi-Jacobian. To better understand the main differences with the Jacobian, the following derives its properties for , using a positive definite and the uniform kernel . Take
[TABLE]
where the is taken over with . To compute the Jacobian, , one would use the set ; the main difference is the choice of neighborhood.
This difference suggests that, unlike the Jacobian, the properties of the quasi-Jacobian depend on the set , which collects all solutions to the moment condition. For any given value , there are three possibilities, either: i. is non-singleton, ii. is singleton and is singular, or iii. is singleton and has full rank. Under i., is not globally identified. Under ii. and iii. is globally identified but only locally identified under iii. Consistency and asymptotic normality require iii., i.e. strong identification, and standard inference need not be asymptotically valid under i. or ii. The following Theorem relates the rank of to identifications i., ii., and iii.
Theorem 1** (quasi-Jacobian, ).**
Take . Suppose and is continuously differentiable for all . Suppose there are and such that when is singleton: for all . Then the quasi-Jacobian is such that:
- (1)
* singular if, and only if: non-singleton or, singleton and singular,*
- (2)
For singleton and full rank: ,
- (3)
For non-singleton: for all ,
- (4)
For singleton and singular: whenever .
The dependence of , on is omitted to simplify notation. The condition for globally identified models holds with if is twice continuously differentiable with bounded second derivative around . Theorem 1 shows that is singular as soon as is such that global or local identification fails (1). An immediate implication of Theorem 1 is that for all such that , ; i.e. the parameter is point identified in direction . This contrasts with the Jacobian which can have full rank without global identification. is singular in all directions in which global identification fails (3), or local identification fails (4); these directions may vary depending on . has full rank only if is such that both global and local identification hold (2). Theorem 1 holds for both just and over-identified models. The identified set can be arbitrary, e.g. discrete. The results do require correct specification, non-empty, for the quasi-Jacobian to be well defined. Even though Theorem 1 is fairly general, the main results will be restricted to settings where either i. is non-singleton, or iii. is singleton and has full rank. Additional results for ii. are given in Appendix I.
The Jacobian generally does not have property (1) or (3). When using projection methods for subvector inference, one can concentrate out nuisance parameters that are both globally and locally identified. The Jacobian can only determine the latter which is not sufficient for consistency. The following illustrates (3) and gives a sketch of the proof using a simple non-linear model where is non-singleton but the Jacobian has full rank for all .
Intuition for linear models.
For linear models, the sup-norm approximation is exact with and for OLS and IV, respectively. The quasi-Jacobian coincides with the Jacobian and it is singular when the regressors are multicollinear or the instruments are not relevant. Both are singular in directions where the rank condition fails.
Non-linear models: a pen and pencil example.
Consider a simple MA(1) process:
[TABLE]
where are the parameters of interest. The model is estimated using the following set of moment conditions (the dependence on is omitted in this example):
[TABLE]
Whenever and , this system of equations has two distinct solutions: and . Imposing invertibility (i.e. ), or non-invertibility (i.e. ) restores identification so that, intuitively, only one dimension is unidentified. Both solutions are locally identified: the Jacobian has full rank at both values; it is uninformative about the global identification failure in this example. The goal of this example is to show that is informative about the lack of global identification and the direction in which identification fails. Without the quasi-Jacobian, one would need to check with pen and pencil whether has multiple solutions, or not.
The first step is to find a one-to-one linear reparameterization such that is uniquely identified but is not. Let and pick any orthogonal such that . By construction: and . This implies that and are equal in direction but distinct in direction . Pick , . As desired: the mapping is one-to-one, with uniquely and set identified. Property (3) in Theorem 1 implies that directions in which is non-singular must be associated with a unique value for . This first step illustrates how these directions can be constructed from the set . Importantly, the linear reparametrization need not be computed explicitly in practice, as explained below.
The second step is to show that is informative about the identification failure and contains information about the reparametrization above. In the MA(1) model, the set has two points. Take and compute the intercept and slope :
[TABLE]
here using the uniform kernel , and for analytical simplicity. Notice that for , . Also, because , the solution is such that for . Using the triangular inequality and its reverse, this implies . Now, express this in terms of the direction vector constructed above:
[TABLE]
In the limit, the quasi-Jacobian is singular in the direction where identification fails. This implies that is a right-singular vector associated with the singular value [math]. The singular value decomposition of is informative about the directions of identification failure and the linear reparametrization from the first step. While the linear reparamerization requires knowledge of and computing all possible with , Theorem 1 implies that the right-singular vectors of associated with the singular value [math] span all directions of identification failure .
In large samples and under Assumptions 1-2, wpa (using the same , ). As a result, for any sequence such that , wpa which signals the identification failure, as desired. To illustrate, Figure 1 compares the distribution of the largest and smallest singular values of the Jacobian , quasi-Jacobian , and scaled quasi-Jacobian with the same cutoff . The scaling makes the singular values scale invariant. The Jacobian fails to detect the lack of identification, even for large (left panel) and also with the scaling . The quasi-Jacobian detects the identification failure since the smallest singular value is below the cutoff. However, the largest singular value is also close to the cutoff. With the scaling, the largest singular value diverges while the smallest one shrinks to zero (right panel).
2.5 Drifting Sequences of Parameters, Identification Regimes
The test procedure described above is said to be robust to identification failure if it has asymptotic null rejection probability bounded above by the nominal size, i.e.:
[TABLE]
In the limit, the worst-case rejection rate should be no greater than the nominal size . Following Andrews and Cheng (2012), this can be determined from the asymptotic properties of the test for specific sequences of parameters .
Assumption 3** (Identification).**
There exists a continuous function and a strictly positive function such that for any where and :
[TABLE]
There exists a and a constant such that for :
[TABLE]
The function indicates whether the solution to the moment condition is unique for a given . The second part of the assumption implies that when , there is at least one such that . Sequences such that , , satisfy since is continuous. The properties of depend on the rate at which converges to zero. Under Assumptions 1 and 3, Lemma A1 shows that , the estimator is consistent, if . When , the estimator is generally not consistent, see e.g. Stock and Wright (2000).
In the MA(1) example, pick , then regardless of as long as the two distinct solutions . The second inequality only holds for , with , which implies .
To give another example, consider a linear IV regression: using and . Here so that and . The inequality holds with equality, i.e. , when is the right singular vector of associated with the smallest singular value. Here implies singular, and the model is underidentified.111111Additional derivations for a non-linear regression model are given in Appendix H.
The dichotomy between and in Assumption 3 allows to construct a measure of global identification strength used to categorize the sequences .121212A similar decomposition can be found in Chen (2007, p5589) to isolate the effect of the sieve dimension on the shape of the objective in nonparametric estimation. Let and . collects all DGPs such that is not uniquely identified, and in those that are point identified. Let , . In the following, any converging sequence will be assumed to belong to one of for some , , or converges in . These will be referred to as weak, semi-strong, and strong sequences.
Assumption 4** (Strong and Semi-Strong Sequences).**
Let where , or . Let . For any , suppose the following holds: i. is continuous in and ; has full rank for all , ii. , , iii. , iv. there exists , such that for , , and , v, , where is a full rank matrix.
Assumption 4 provides sufficient conditions to establish asymptotic normality of at a potentially slower than -rate. Condition i. is standard and ensures the model is locally identified. Condition ii. allows the Jacobian to be vanishing at a slower than -rate in some directions. Conditions iii. is a stochastic equicontinuity condition. Condition iv. implies that the Taylor remainder is quadratic under the weaker norm , which is the relevant norm for convergence when . Indeed, Lemma A2 establishes that . Condition iv. excludes settings where the non-linear remainder dominates the first-order term.131313These second or higher-order identification issues are not considered in the main text, additional results for the quasi-Jacobian under higher-order identification are given in the Supplement. Condition v. is analogous to Assumption 3iv in Antoine and Renault (2012). It requires a rescaling for which the Jacobian is non-singular in the limit. For instance, under a singular value decomposition of the form , we have . The rescaling corrects for the possibly vanishing, but non-zero, terms in the diagonal . Antoine and Renault (2021, Sec2.2) discuss conditions relating to Assumption 4 in more detail.
Proposition 1** (Asymptotic Distribution for (Semi)-Strong Sequences).**
Let . Let , if Assumptions 1, 3 and 4 hold then:
[TABLE]
Proposition 1 implies that the test is asymptotically valid for any choice of and asymptotically non-conservative if wpa 1. Furthermore, for just-identified models , and the test is asymptotically efficient if wpa 1.
Linear reparameterization.
As in the MA(1) example, the derivations rely on a one-to-one linear reparameterization with uniquely and set identified. The following steps construct the reparameterization, which is not implemented in practice: the span of right-singular vectors associated with singular values below consistently estimates the span of identification failure. The following applies to just and over-identified models.
First, take , collect all solutions to the moment conditions . Let and . If , then which implies that is a singleton; i.e. the parameters are uniquely identified. This is the case when . If strictly, then strictly; i.e. the parameters are set identified. This is the case when . As in the MA(1) example, by projection for any two ; i.e. the solution is unique on . In contrast, for any non-zero , there exists two distinct s.t. , by construction. Define as the projection of on and the projection on . The matrix combines the bases of and . As illustrated by the MA(1) example, it may not be possible to improve on this linear reparameterization with a non-linear one without some further structure on the moments or the model. The reparameterization is defined up to a rotation on and , respectively.
For testing , the identification status of the nuisance parameters matters. Consider a further sub-decomposition where only is unidentified under the restriction . To find it, take and follow the same steps as above. By construction, is a subset of , also is in and is the subset of which is unidentified under .141414Note that, by linearity and by construction, .
Now consider sequences with . Combine the linear reparameterization with the continuity of with respect to and to find, using the Maximum Theorem, that for all , any , and letting :151515To apply the Maximum Theorem, note that by continuity of and compactness of , both and are compact subsets of and , respectively. Similar equations can be derived for with the added constraint .
[TABLE]
where is the identified set for when . The first limit implies is consistently estimable, while the second and third imply that the population objective function becomes flat (only) on . The decomposition so far separates point identified from set unidentified when .161616Note that for the class of models considered in Andrews and Cheng (2012), their parameter which is point identified and determines identification strength is included in the vector constructed here.
If there is a single source of identification failure, then is determined by a scalar subset of , and is bounded above for weak sequences. To illustrate, consider the linear IV example again with a single endogenous regressor and one instrument . In this case depends on the scalar being bounded which characterizes weak sequences. In the case with multiple sources of identification failure, there may be mixed identification strength, and some components of may be (semi)-strongly identified, so the reparameterization needs to be further refined. This is deferred to Appendix E.
Assumption 5** (Weak Sequences).**
Let with . Let be the null-constrained space for . There exists continuous satisfying and strictly positive, and two non-empty and non-singleton sets and such that for any :
- i.
* and ,*
- ii.
,
, and , where the infs are taken over the constrained space .
Assumption 5 adds this additional structure to (7)-(9), where are assumed semi-strongly and weakly identified. The first part Assumption 5i. implies is consistently estimable, allowing for some components to be semi-strongly identified. The second and third part imply the objective function is flat with respect to but only on the identified set . For the quasi-Jacobian, the implies that uniformly in with increasing probability so that Step 2.i of the procedure consistently estimates the identified set and all directions of identification failure. Similarly, condition ii. repeats the conditions under the restriction that The parameters correspond to the directions that are consistently estimable. To simplify notation, the Proposition below denotes as these coefficients that are consistently estimable and semi-strongly identified under .
Proposition 2** (Asymptotic Distribution for Weak Sequences).**
Let . Suppose there is a linear reparameterization invertible, , such that the moment function satisfies Assumptions 1, 3 and 4, then:
[TABLE]
Proposition 2 implies that the test procedure has limiting null rejection probability bounded by the nominal size for weak sequences as long as wpa 1, since . Note that Assumption 3 with respect to is implied by Assumption 5 ii.
3 Asymptotic Behaviour of the quasi-Jacobian
As discussed above, the properties of the ICS and test procedure are tied to those of the quasi-Jacobian under different identification regimes. The following derives the large sample behaviour of the sup-norm and least-squares quasi-Jacobian matrices under strong, semi-strong, and weak identification.
3.1 Strong and Semi-Strong Sequences
Theorem 2** (quasi-Jacobian and Jacobian Equivalence).**
Let denote the quasi-Jacobian. Suppose with or . Suppose that Assumptions 1, 2, and 4 hold. If and , then:
[TABLE]
where by assumption and .
The proof is given in Appendix B. Theorem 2 implies that, for (semi)-strong sequences, the quasi-Jacobian, and the Jacobian are asymptotically equivalent after re-scaling to a non-singular limit. For non-smooth moments, where the sample Jacobian is not defined as in quantile-IV regression or SMM estimation of discrete choice models, can be used in the sandwich formula to compute standard errors for . Assumption 4 v. implies , hence:
[TABLE]
For sufficiently strong sequences such that , where is the cutoff in Section 2.3, this implies that wpa 1.
3.2 Weak Sequences
Theorem 3** (Asymptotic Singularity of the quasi-Jacobian).**
Suppose with and Assumptions 1, 2, 3, 5 hold. For any , with in the identified set for , . Let denote the eigenvalues of in increasing order, then:
[TABLE]
In particular, .
Theorem 3 shows that when is not uniquely identified, the quasi-Jacobian vanishes at a rate in all directions associated with the identification failure. The span of these directions has dimension so that vanishes on a subspace of dimension . Hence, small singular values are indicative of an identification failure, and the number of weakly identified coefficients. The constants involved in the terms are made explicit in the proof. The following Proposition extends these results to , which focuses on the identification status of the nuisance parameters only. For both results, the proof is similar to the derivations used for the MA(1) example.
Proposition 3** (quasi-Jacobian after Projection).**
Suppose with and Assumptions 1, 2, 3, 5 hold. For any , with the identified set for under the null, . Let denote the eigenvalues of in increasing order:
[TABLE]
4 Asymptotic Properties of the Test Procedure
As discussed above, the ICS procedure used to compute relies on two normalizations that ensure invariance to rescaling of the sample moments and/or the parameters. The first normalizing matrix is computed in the procedure outlined above. is shown to be bounded above in directions associated with the identification failure in Lemma D4, so that Proposition 3 extends to the normalized . Under strong identification, Lemma D3 implies that so that diverges at a rate in directions. As a result, vanishes at a -rate in directions where identification fails, and diverges at a -rate when all parameters are strongly identified.
The second normalizing matrix is , where is an estimator of the asymptotic variance . The Assumption below requires consistent and asymptotically non-singular so that the normalization does not alter the asymptotic properties of .
Assumption 6**.**
* is non-singular and for all , , .*
Theorem 4** (Asymptotic Size).**
Suppose Assumptions 1-5 hold. Let such that . Let \hat{d}_{n}=\#\big{\{}j\in\{d_{\theta_{1}}+1,\dots,d_{\theta}\},\,\lambda_{j}(P_{\theta_{1}}^{\perp}\Sigma_{n}^{-1/2}P_{\theta_{1}}^{\perp}B_{n,\infty}^{\prime}\overline{V}_{n}^{-1}B_{n,\infty}P_{\theta_{1}}^{\perp}\Sigma_{n}^{-1/2}P_{\theta_{1}}^{\perp})>\underline{\lambda}_{n}^{2}\big{\}}, then for any :
[TABLE]
For any sequence such that :
[TABLE]
Theorem 4 establishes the uniform validity of the test procedure described in Section 2.3 under strong, semi-strong, and weak sequences. First, it is shown that the normalizations do not affect the predictions of Theorems 2, 3, and Proposition 3. Then, since and are compact, the worst-case rejection probability is attained by a converging subsequence which, using the stated assumptions, can be interpolated into a converging sequence in either for some , , or converging in . The result then relies on two properties. The first is that under weak identification, and the second is that which has a standard chi-squared limiting distribution with degrees of freedom that only depend on the dimension of , and the number of identified nuisance parameters. For just-identified models, the resulting procedure is efficient under strong identification since it uses the smallest valid critical value, and is equivalent to a quasi-Likelihood ratio test. For over-identified model, the test uses the smallest valid critical value for the projected AR test so it is non-conservative within that class. The results above can be extended to some other existing robust test statistics. For instance, the K-statistic of Kleibergen (2005) is such that, under additional regularity conditions, which also has a chi-squared limiting distribution with reduced degrees of freedom.
5 Monte-Carlo Simulations
The finite-sample properties of the quasi-Jacobian matrix and the test procedure are illustrated using a consumption capital asset pricing model (CAPM) as in Wright (2003, Sec3).
Let , measure time preference and relative risk aversion. are real consumption, dividends, and the gross asset return at time . The Euler equation is: , where measures consumption growth. depends endogenously on , where and , which follows a first-order vector autoregressive (VAR) process: , where . The sample moments are:
[TABLE]
where . Tauchen (1986) illustrates how affects the finite-sample properties of . The following considers three DGPs: Rank Failure (RF), Near Rank Failure (NRF) and Full Rank (FR).171717RF, NRF and FR correspond to RF1, NRF1 and FR in Wright (2003, p326). Wright (2003) explains that they correspond to being set, weakly, and strongly identified. NRF is calibrated to match annual U.S. data (Kocherlakota, 1990, Sec3).
Table 2 reports rejection rates for the method in Section 2.1 (Proj1), full projection inference using (Proj2) and (Proj3) critical values as well a t-test with standard normal critical value (). The empirically relevant sample sizes are . illustrate large sample properties. The parameter space is . The t-test does not control size in RF and NRF. It is closer to nominal size for FR. However, as Figure H5 in Appendix H.2 shows, another global solution is estimated in about 1% and 0.05% of the replications for . Here, the parameters are locally strongly identified, but not globally. The sample Jacobian would not detect this issue which leads to some over-rejection for the t-test. In comparison, the proposed procedure (Proj1) has null rejection rates below nominal size across sample sizes and DGPs.
Wright (2003, Sec3), Antoine and Renault (2009, Sec5) explain that one coefficient is always strongly identified. Table 2 and Figure 2 confirm this. The procedure finds to be weakly identified in nearly all replications for RF, NRF, and strongly identified. For FR, the procedure finds weakly identified in and of replications when .
Figure 3 compares the power of the proposed procedure (AR1) with full projection inference (AR3), and projection inference with the nuisance parameter concentrated out (AR2) as well as the t-test when appropriate (FR with ). The results show power improvement over full projection inference when the nuisance parameter is strongly identified, i.e. when testing hypotheses about . When the model is strongly identified (FR), the procedure is less powerful than the t-test because of over-identification. Result for a just-identified specification with and a larger are given in Appendix H.2. Another example in that Appendix compares the procedure with Andrews and Cheng (2012) for a non-linear regression.
6 Application to the Long-Run Risks Model
To illustrate the empirical content which can be gained from the quasi-Jacobian for inference, consider a simulated method of moments estimation of the long-run risks (LRR) model (Bansal and Yaron, 2004). There are two latent variables representing a persistent component to the level of consumption growth and stochastic volatility :
[TABLE]
where if and as in Calvet and Czellar (2015, p346). Consumption and dividend growth are then given by:
[TABLE]
where iid. Given an Epstein-Zin utility function, equilibrium conditions imply that financial variables, log-price dividend ratio , market return and the risk-free rate can be written as:
[TABLE]
where the coefficients are computed numerically as a solution of a non-linear system of equations involving the full vector of 12 parameters where is the discount factor, risk-aversion, and the inverse intertemporal elasticity of sustitution (IES). See Bansal and Yaron (2004) for details. The variables above need to be further time-aggregated from the monthly decision interval to match the quarterly frequency of the data. There are a number of estimations of this model using one of SMM and Indirect Inference,181818See Bansal et al. (2007); Hasseltoft (2012); Calvet and Czellar (2015); Grammig and Küchlin (2018). GMM,191919See Constantinides and Ghosh (2011); Bansal et al. (2012, 2016)., or Bayesian estimation202020See Schorfheide et al. (2018). There are, however, several concerns for the identifiability of the parameters. Calvet and Czellar (2015) show that the latent variables cannot be recovered from the data for uncountably many values of , resulting in highly irregular GMM and likelihood objective functions. Grammig and Küchlin (2018) find that the stochastic volatility component is poorly identified and calibrate . However, stochastic volatility in long-term consumption growth has important implications for asset prices (Schorfheide et al., 2018). Several papers report estimates with very small standard errors (see Grammig and Küchlin, 2018, Table 7, p24), but estimates can vary a lot across estimations. This suggests that some parameters are likely not globally identified but might be locally identified.
The following considers joint inference for the two preference parameters . The remaining coefficients are . Amongst these nuisance parameters, it seems reasonable to think that several are (semi)-strongly identified. However, the asset pricing coefficients are highly non-linear functions of so it is arguably more difficult to pin down exactly how many and which ones are well identified. Nevertheless, the results in this paper imply that can determine how many nuisance parameters are weakly identified with high probability.
The moment conditions used for inference are based on matching the following sample with simulated moments: means of all variables, variances of , AR coefficients of , and autocorrelation of .212121A quasi-difference is applied beforehand because is very persistent making nearly singular, the quasi-differencing solves this issue and makes the estimation below more stable. These just-identified moments match quantities of interest that are commonly reported in calibrations or post-estimation, see e.g. Beeler and Campbell (2012). The estimation is conducted using U.S. data shared by Grammig and Küchlin (2018) for over 1947Q2-2014Q4, totalling in observations. The simulated moments are computed over samples. The bounds for the optimization space are . Computations are conducted in R and C++ using Rcpp.
Table 3 compares the spectrum of the normalized Jacobian and quasi-Jacobian.222222The estimate used for the Jacobian is computed by using the calibration in Bansal and Yaron (2004) as starting value, and alternating between the Nelder-Mead and bobyqa optimizers until convergence. Note that different seeds for the simulated samples yield very different estimates but similar fitted moments. Also, the sample gradient is not available analytically; it is computed by finite differences which here is quite sensitive to the choice of step size. Using the threshold implies that detects directions of identification failure, with an additional singular value just above the threshold. In comparison, the gradient is small in directions. After projecting out , there are singular values above the threshold, indicating (semi)-strongly identified parameters. Hence, inference for relies on a critical value. In comparison, full projection relies on , and standard inference .
Figure 4 reports 5000 draws of such that using the Population Monte Carlo algorithm in Appendix F, plus their convex hull in blue. Values for are contained in and . This excludes several regions of interest. First, we can reject at the 95% confidence level, i.e. the IES is strictly greater than unity. Second, we can reject and conclude that the utility function is not CRRA. Finally, the confidence set favours over . Under , households prefer an early resolution of uncertainty; their preference for consumption smoothing is less than their relative risk aversion. Although not reported here, note that full projection inference cannot reject some of these null hypotheses. As a robustness check with respect to tuning parameters, Appendix H.3 finds the same results using critical values (Figure H14) and using a larger value for (Table H6).
7 Conclusion
This paper introduces a quasi-Jacobian matrix which is asymptotically equivalent to the usual Jacobian matrix under strong and semi-strong identification but is asymptotically singular when global identification fails. This can be useful because the Jacobian is not always informative about global identification failures. While the inference procedure relies on the AR statistic, extending the results to the robust score test is straightforward, as discussed earlier. For overidentified models, it could be interesting to extend the theory to more powerful test statistics such as the CQLR/AR test in Andrews (2017). Another concern could be that a given choice of moments does not identify the parameters but another set of moments might. This is a moment selection problem. In that case, it could be interesting to extend the quasi-Jacobian to a continuum of moment conditions which can be used for conditional GMM estimation (Carrasco and Florens, 2000); allowing the use of all available information rather than selecting finite dimensional moments.
Appendix A Preliminary Results
A.1 Preliminary results for Section 2
Lemma A1** (Strong and Semi-Strong Sequences: Consistency).**
Let . If or and Assumptions 1, 3 hold, then .
Lemma A2** (Strong and Semi-Strong Sequences: Asymptotic Normality).**
Let . If or and Assumptions 1, 3, 4 hold, then
[TABLE]
where , , .
Appendix B Proofs for the main results
B.1 Proofs for Section 2
Proof of Theorem 1:
For simplicity, the derivations for this Theorem rely on , see derivations for Section 3 for derivations with other kernels. For , let . By construction, . To simplicify notation, denote and . There are three cases to consider:
Case 1) Take non-singleton with . Take , then , by construction. Also by construction, . As a result, for and the triangular inequality implies Take the limit as to find where . Hence, is singular.
Case 2) is singleton and is singular. Take any vector with . For the following, consider for some such that . Then for all . As in Case 1), for all with . Then . Take , fixed, then and as since . This implies ; is singular.
Case 3) is singleton and has full rank. Continuity and global identification imply for some and all . Consider so that implies . Let . For these values of , . We can further assume, without loss of generality, that and thus are sufficiently small that and . Re-write for some vector , then implies . Likewise, implies . Pick , then by construction: . Pick any , then and . Then . This implies since . Then using . Since this holds for any vector with , this implies that and, in addition, .
For the statements in the Theorem: Case 1) implies results (3), Case 2) implies result (4) and, case 3) implies result (2). For results (1), non-singleton or, singleton and singular implies singular. singleton and full rank imply full rank. ∎
Proof of Proposition 1:
Note that Assumptions 1, 3 and 4 hold for the moment function . Applying Lemma A2, we have:
[TABLE]
where . By construction of the test statistic, we have: . We also have:
[TABLE]
The leading term converges to , where which has rank . This limit is an orthogonal projection matrix with rank . Hence, by the continuous mapping theorem: . ∎
B.2 Proofs for Section 3
B.2.1 Strong and semi-strong sequences.
Proof of Theorem 2 for :
Pick a such that Assumption 4 iv. holds, then using :
[TABLE]
which implies wpa 1. Take , using Assumption 4 iv. and using the change of variable with we have:
[TABLE]
The term on the right-hand-side is a by assumption. The squared norm by construction of . Hence, wpa 1 uniformly in so that wpa 1.
For any such that , so that Assumption 4 iv. applies with . For any two candidates we have wpa 1:
[TABLE]
for by Assumption 1 ii., using wpa 1 for by similar derivations as above.
Pick and then . By contradiction, suppose and/or , in probability. Then for any with , we have wpa 1:
[TABLE]
in probability for at least one while the same quantity converges in probability to zero when evaluated at . For instance if , pick . This contradicts the approximate minimizer property of . We conclude that and .
∎
B.2.2 Weak sequences.
Definition B2**.**
Define the span of the identification failure in the full space and the constrained space respectively as:
[TABLE]
Proof of Theorem 3:
Let . After applying the reparameterization, we have:
[TABLE]
For any such that , we have: and then By continuity of on we have for some constant so that:
[TABLE]
for any . Then, using the reverse triangular inequality:
[TABLE]
By definition of , we can find pairs with , for two such that the vectors , are linearly independent. By assumption, we have:
[TABLE]
which is a . This implies that with wpa 1 uniformly in so that with wpa 1 uniformly on the same set. In turn, we have wpa 1 for all :
[TABLE]
Using the triangular inequality, we have wpa 1 and uniformly in :
[TABLE]
Let . By linear independence, is well defined and:
[TABLE]
wpa 1. For any , hence wpa 1. To find the other two results note that is Hermitian, and is an orthogonal projection matrix by construction. Hence admits an eigen decomposition of the form with ; is the conjugate transpose of and bckdiag builds a block-diagonal matrix. Using this decomposition we have:
[TABLE]
where are the first columns/rows of and , respectively, which satisfy . As an implication of the minimax principle (Bhatia, 1997, Problem III.6.11, p77) and the equality above, we have the following inequality:
[TABLE]
wpa 1. This concludes the proof. ∎
Proof of Proposition 3:
Following the steps in the proof of Theorem 3, we can construct a basis for using with pairs . Since is fixed, we have and for the basis . Hence, and . By the minimax principle, these imply the desired inequality : ∎
B.3 Proofs for Section 4
Proof of Theorem 4:
First, we show that normalizations do not affect the results of Proposition 3 for weak sequences. This amounts to showing that has singular values that are . From Proposition 3, there exists a linearly independent family such that (from Definition B2) and where . Similarly from Lemma D4. Also because where by design, we have: . Then using the minimax characterization of singular values (Bhatia, 1997, Problem III.6.5), we have:
[TABLE]
Then , and for from Proposition 3. Since has rank and is orthogonal to the rank matrix for which , we have that for , in increasing order.232323Pick and and notice that the matrix is bounded above on a subspace of dimension . Also is strictly positive and bounded below, because is invertible, so that: as well. Likewise, Assumption 6 implies that which then also implies that for as desired.
Now we are interested in establishing the asymptotic size of the test. Let be a sequence in such that
[TABLE]
as noted in Andrews et al. (2020, p501), such a sequence always exists. There always exists at least one subsequence of which achieves the above, i.e. for some strictly increasing: . Assumption 1 i. implies that is sequentially compact so that this subsequence admits a convergence sub-subsequence in , i.e. for some strictly increasing: and has the same limit.
Now, if we can find a converging sequence , , in one of , for some , , or converging in such that when then the limiting rejection probability for the subsequence can be derived from the limiting rejection probability of the full sequence . Suppose . Pick when and otherwise. is a converging sequence with . If , then is a sequence taking values in , the positive part of the extended real line which is a compact space. This implies that admits at least one subsequence which converges in . Let index the resulting subsequence of . There are now two possibilities: either or .
Suppose . Pick when . For , pick as well. By construction so that . By construction , hence Proposition 1 implies that
[TABLE]
for any . This in turn implies that:
[TABLE]
Suppose . Pick for any . For , define ; note that . Suppose, without loss of generality, that . Take . If , then and . If , pick . If and , Assumption 1 i. implies that the closure of is connected. Hence, there exists a continuous map: such that and and for any . By continuity of , the image of is a closed interval which contains and . For each , the values [math] and are both contained in the image , so that there exists a such that . Pick . If is attained at , repeat the above with instead of . By construction and . This implies that with . As shown above, for this converging sequence we have wpa 1. Using Proposition 2:
[TABLE]
Then, we have: . Putting everything together, we have: for the original sequence .
For the second part of the Theorem, note that so Theorem 2 applies. Now, from the proof of Lemma D3: which is the arg-minimizer of the limiting sup-norm minimization and is non-singular because of the log-determinant. Hence, . Now, this implies:
[TABLE]
where the last inequality follows from Assumption 6 and the discussion after Theorem 2. Since is bounded below, we have wpa 1. This implies wpa 1 and:
[TABLE]
which concludes the proof. ∎
Appendix C Proofs for the preliminary results
C.1 Preliminary results for Section 2
Proof of Lemma A1:
First, using for any we have:
[TABLE]
uniformly in . The second inequality is:
[TABLE]
since . Pick any . For any approximate minimizer such that , using the two inequalities above:
[TABLE]
since for sequences converging in or . ∎
Proof of Lemma A2:
For any approximate minimizer such that , we have by Lemma A1 and:
[TABLE]
By assumption, positive definite so the above implies:
[TABLE]
As in Newey and McFadden (1994) completing the square above implies . Taking the square root on both sides yields:
[TABLE]
Define \tilde{\theta}_{n}=\theta_{n}-\Big{(}\partial_{\theta}g(\theta_{n},\gamma_{n})^{\prime}W(\theta_{n})\partial_{\theta}g(\theta_{n},\gamma_{n})\Big{)}^{-1}\partial_{\theta}g(\theta_{n},\gamma_{n})^{\prime}W(\theta_{n})\bar{g}_{n}(\theta_{n}). By continuity of and , we have: . To conclude the proof we need to prove that . Using similar calculations as above, we have:
[TABLE]
By construction of , This implies the following equalities:
[TABLE]
Since is an approximate minimizer, we have:
[TABLE]
which implies and concludes the proof. ∎
Appendix D Supplemental Results
The following results concern the matrix used for re-scaling in the procedure. The derivations follow very closely those in Theorems 2 and 3.
Lemma D3**.**
Suppose is the uniform kernel and the Assumptions for Theorem 2 hold, then .
Proof of Lemma D3.
As in the proof of Theorem 2, let . Take for , . Then wpa 1:
[TABLE]
using the argmax Theorem and taking the p-limit on the right-hand-side. The term can be removed because for the uniform kernel so it does not alter the supremum and the infimum. This implies that as desired. ∎
Lemma D4**.**
Suppose is the uniform kernel and the Assumptions for Theorem 3 hold, then there exists such that wpa 1 for any with . This implies that , wpa 1 for some finite constant .
Proof of Lemma D4.
Pick as stated in the Lemma. Let , so that . Because is the uniform Kernel, after a change of variable, are the minimizers of:
[TABLE]
wpa 1, because wpa 1 under the Assumptions, and the infimum is less than for . This implies that . Using , we have . Apply the triangular inequality to find wpa 1: To get the first inequality, note that . The second inequality, can be derived using the same steps used in the proof of Theorem 3 and the minimax principle. ∎
Appendix E Linear Reparameterization, Continued
The following gives additional details about the linear reparameterization in Section 2, and describes the additional steps to use when there are multiple sources of identification. To simplify the discussion, it will focus on two specific examples.
The main idea is that if there are multiple but finitely many sources of identification failure, we can construct a finite partition of where each subset is associated with a common rate (semi-strong, weak). Then, refine the reparameterization by using only the subset(s) corresponding to weak identification. When there is a single (scalar) source of identification failure, the partition presented in the main text systematically has weakly identified for weak sequences because the objective function becomes flat at the same rate on the entire set . The partition only has one element which is itself.
Example 1: Linear IV regression
First, consider the linear IV regression, now with multiple instruments. Let with moment condition . It can be re-written as , if . As seen from the discussion of Assumption 3, identification fails for any such that . Because the moment condition is linear in for this example, the linear reparameterization described in Section 2 is such that , where kern is the kernel, or null space, of the matrix. When the matrix has full rank and the solution is unique.
Consider sequences such that where is diagonal, and are semi-unitary: . The span covers directions associated with the singularity, i.e. all columns , , of where . Consider only sequences such that the limit exists in .111Note that takes values in the extended real line which is compact so we can always find a converging subsequence in the extended real line. This step appears in the proof of Theorem 4. Split the indices in two sets: and . Clearly and . Take to be the span associated with the columns , . Then complete the reparameterization by taking in the orthogonal of . Since the reparameterization is defined up to a rotation, suppose for simplicity that is ordered such that and, note that: , where . Assumption 5 can now be verified from this representation. Here the sources of identification failure are indexed by the singular values , , and the parameter space is partitioned into different directions: , , associated with the .
Example 2: Non-Linear regression
Consider the regression setup in Cheng (2015): , , each here is scalar. The coefficient is unidentified if the corresponding . This is related to the example used in Section 5. For a vector of instruments , take the moment condition which can be re-written as:
[TABLE]
Take such that for at least one . Then is non-singleton and includes all possible values of for which . Suppose , , and the functions are such that only the coefficients are potentially unidentified. The linear reparameterization based on is such that include all coefficients for which , while includes and the remaining , for which .
Take a converging sequence . Following the same steps as in the previous example, let , define and in the same way as above. As before, apply the reparameterization but now includes the with and collects all remaining coefficients. Here the sources of identification failure are indexed by , . This time, the partition separates the directions , associated with the different .
Linear Reparameterization with Mixed Identification Strength
The goal of the following is to refine the linear reparameterization give in the main text when there is mixed identification strength, so as to have semi-strongly and weakly identified. The procedure relies on having finitely many sources of identification failure as in the above examples.
In the previous two examples, there were sources of identification failure. There, for a sequence associated with weak identification, there are possibilities for identification strength. For instance, in Example 2 we have with at least one . For each there are two possibilities (, ) leading to outcomes, minus where for all , which precludes weak identification, in which case all parameters are (semi)-strongly identified.
With these possible combinations, there are possible subsets on which the parameters can be weakly identified. In Example 2, one possible subset is associated with and for ; here . Then there are continuous and such that:
[TABLE]
Now take such that and .
If , all parameters in are weakly identified and Assumption 5 i. follows from the properties of linear reparameterization and the Maximum Theorem, as explained in the main text. Otherwise, , strictly; only some parameters in are weakly identified.
Let and . Let and . By construction, is at most a singleton on and a set of dimension on , denoted .
If then all directions of are weakly identified and is unchanged. By construction, . To reduce notation, suppose , then , using . This implies that with which yields Assumption 5 i., i.e. is semi-strongly identified and is weakly identified on the set .
Appendix F Uniform Sampling on Level Sets
As shown in Section 2.1, the computation of the quasi-Jacobian requires uniform draws over the level set and similarly test inversion amounts to finding the level set and projecting it onto .
Direct approach:
the approach used in Section 5 amounts to importance sampling. Draw uniformly distributed on and assign weights proportional to . The weighted sample is uniformly distributed on the level set. The draws can be random or pseudo-random using quasi-Monte Carlo sequences such as the Sobol or Halton sequence (see Lemieux, 2009, Section 5). The main drawback of this approach is that the effective sample size can be very small, i.e. few draws have non-zero weight, when the level set is small relative to the parameter space. In particular, the effective sample size is approximately which tends to be small when the dimension of is moderately large.
Adaptive Sampling by Population Monte Carlo:
the main idea here to is preserve the simplicity of importance sampling while constructing a sequence of proposal distributions with a higher acceptance rate. Algorithm 1 below is adapted from the Population Monte Carlo principle laid out in Cappé et al. (2004). Consider a sequence of level sets: with for some . By construction and . This implies that it is easier to generate uniform draws on than on .
The following summarizes the algorithm in plain terms. The initialization step is a simple accept-reject algorithm to generate iid draws on . Then given a set of draws , draw uniformly from the weighted sample and generate using a transition kernel , for instance a random-walk step . Re-draw both and until the criterion is met and then set the weight according to the sampling probability . Repeat this process for each and each . The final weighted sample targets the desired distribution.
There are several choices of tuning parameters in the steps above. First, can be chosen adaptively to avoid decreasing it too fast or too slow which would result in poor computational performance. In the empirical application, is set according to median value of from uniform draws on ; this yields . Then is set according to where is the quantile of . This guarantees that is strictly decreasing but declines slowly enough to maintain a reasonable acceptance rate. To adapt to the shape of each , the proposal is also constructed adaptively. For each , a clustering algorithm is applied to the draws to split the draws into clusters. Then is times the variance of the draws from the cluster in which belongs. This accommodates multimodality in the objective function. The inner loop, over , is run in parallel which speeds up the computation significantly. In the application, the final is attained from the initial after iterations.
The output of Algorithm 1 is used both to compute and later for test inversion by picking such that and running one more iterations with . This yields the 5000 draws shown in Figure 4.
Appendix G Sample Code to Implement to Procedure
The following provides some sample R code to perform the steps outlined in Section 2.1 for the Monte Carlo example in Appendix H.2.
require(randtoolbox) # Used to generate the integration grid
library(pracma) # Used to compute matrix square root
library(CVXR) # CVX for R
library(Rmosek) # To use the MOSEK solver in CVX
set.seed(123)
n = 1e3 # Sample size
B = 1e4 # Number of draws
Robust and standard critical values:
critical_R = qchisq(0.95,2)
critical_S = qchisq(0.95,1)
**************************************************************
Simulate Data, Define Moment Conditions
**************************************************************
c = 1 # c determines identification strength
b1 = c/sqrt(n) # theta1 = c/sqrt(n)
b2 = 5 # theta2 is fixed
Simulate data: x1, x2, e, and y = b1x1 + b1b2*x2 + e
x1 = rnorm(n)
x2 = rnorm(n)
e = rnorm(n)
y = b1x1 + b1b2*x2 + e
moments function(b,y,x1,x2) {
# computes the sample moments and the variance of the moments
e_hat = y - b[1]*x1 - b[1]*b[2]*x2 # residuals
mom = cbind(e_hat,e_hat)*cbind(x1,x2)
mom_m = apply(mom,2,mean) # g_bar
V = var(mom) # V_hat
return( list( mom = mom_m, V = V ) )
}
objective function(b,y,x1,x2) {
# computes the GMM objective function
mm = moments(b,y,x1,x2)
return( t(mm$mom)%*%solve(mm$V,mm$mom) )
}
**************************************************************
Compute the quasi-Jacobian Matrix
**************************************************************
Set the integration grid:
s = sobol(B,2,scrambling=1)
p = cbind(rep(b1,B),rep(b2,B)) + 2*(s-1/2)
objs = rep(NA,B) # Store GMM objective values
moms = matrix(NA,B,2) # Store sample moments mom
Vs = array(NA,dim=c(2,2,B)) # Store variances V
for (b in 1:B) { # Evaluate the moments on the grid
mm = moments(p[b,],y,x1,x2)
objs[b] = t(mm$mom)%*%solve(mm$V,mm$mom)
moms[b,] = mm$mom
Vs[,,b] = mm$V
}
Select draws on the level set
ind = which(objs - min(objs) 2*log(log(n))/n)
grid_sub = p[ind,]
moms_sub = moms[ind,]
Vs_sub = Vs[,,ind]
X = cbind(1,grid_sub) # regressors: intercept and theta_b
write the optimization problem for CVX
beta = Variable(dim(X)[2],dim(moms_sub)[2]) # matrix of coefficients (A,B)
objc Minimize(norm( moms_sub - X %*% beta,"I")) # l-infinity loss
prob Problem(objc) # compile the problem
result solve(prob,solver="ECOS_BB") # compute the solution
coef = result$getValue(beta) # extract solution
Bn = t(coef[2:3,]) # quasi-Jacobian matrix
Now compute the normalization matrix for the left-hand-side
V = matrix(0,2,2) # Compute V_bar the average variance matrix
for (b in 1:length(ind)) {
V = V + Vs_sub[,,b]/length(ind)
}
Now compute the normalization matrix for the right-hand-side
mu Variable(1,2) # vector of means
one = matrix(1,length(ind),1)
VV = Variable(2,2) # matrix of variances
objc Minimize( - log_det(VV) + 0.5norm( (grid_sub%%VV - kronecker(one,mu))*∧*2,"I")) # setup the minimization problem in CVX
prob Problem(objc) # compile
result2 solve(prob,solver="MOSEK") # solve using MOSEK solver
phi = result2$getValue(VV) # extract solution
Note that phi = Sigma*∧*(-1/2), the problem was reparameterized
**************************************************************
Identification Category Selection
**************************************************************
v = c(1,0) # vector which spans theta1
M = diag(2)-v%*%t(v) # Projection matrix onto the span on theta2
Normalized quasi-Jacobian matrix
sqrtm computes the matrix square root and Binv its inverse
Bnorm = ( sqrtm(V)$Binv )%%( Bn%%M )%*%phi
singular values in decreasing order
sing = svd(Bnorm)$d
cutoff = sqrt(2*log(n)/n) # cutoff lambda_n for ICS
print(’Singular values without projecting out theta1:’)
print( round(svd(( sqrtm(V)d,3) )
print(’Singular values after projecting out theta1:’)
print(round(sing,3))
print(’Cutoff:’)
print(cutoff)
Set critical value depending on the singular value and cutoff
cr = 1*(sing[1]>cutoff)critical_S + 1(sing[1]<cutoff)*critical_R
if (sing[1]>cutoff) {
print(’Nuisance parameter is semi-strongly identified’)
} else {
print(’Nuisance parameter is weakly identified’)
}
**************************************************************
Subvector Inference
**************************************************************
Test H0: b1 = b10 at the 5% significance level
b10 = 0
obj function(b2,b10,y,x1,x2) {
return( objective(c(b10,b2),y,x1,x2) )
}
Anderson-Rubin test statistic
AR = n*optimize(obj,c(-20,20),b10=b10,y=y,x1=x1,x2=x2)$objective
if (AR > cr) {
print(’Reject H0’)
} else {
print(’Cannot reject H0’)
}
Compute a 95% confidence set:
ind = which(objs cr)
print(’Confidence Interval for theta1:’)
print(c(min(p[ind,1]),max(p[ind,1])))
print(’True value:’)
print(b1)
Appendix H Additional Results for Section 5
H.1 Verification of the Main Assumptions
We now verify the main assumptions for the NLS example in Appendix H.2:
[TABLE]
where iid. The optimization space is , where . We can then set for any . The parameter space is then , where indexes the distribution of which here is very simple since , the normal distribution above. More general choices of distribution spaces one could consider could take the form: . See Andrews and Cheng (2012) for more examples. Assumption 1 i., ii. hold for this choice of , and
The sample moments are and their population counterpart is They can be re-written as:
[TABLE]
The lower triangular matrix has two eigenvalues: and . Hence, we have the following inequality: . This implies that Assumption 3 i. holds with which is continuous in , and . To verify Assumption 3 ii., take , then . We have .
Assumption 4 i. holds for any . Condition ii. holds if . Condition iii. is a stochastic equicontinuity condition which can be verified by Lipschitz continuity and conditions on the parameter space and the distribution of the covariates and the errors. Condition iv holds because the quadratic term vanishes at the same rate as the first-order term in the Taylor expansion ( is a polynomial of order which becomes flat wrt when ). Condition v. can be verified numerically.
For Assumption 5 i., note that . Here we can use , . For for weak sequences. Hence, Assumption 5 i. and ii. hold with .
H.2 Additional Simulation Results
Consumption Capital Asset Pricing Model (CAPM)
Figure H5 shows the sampling distribution of the CAPM estimates .
Table LABEL:t2xCAPM_JI and Figure H10 replicate the results in the main text for a just-identified specification where .
Non-Linear Regression Model.
To illustrate the finite-sample properties of the quasi-Jacobian matrix and the test procedure, consider the following nonlinear regression model:
[TABLE]
where iid. The sample moment conditions are with population counterpart . For , is unidentified and for , , is weakly identified, even if is known and fixed. The reparameterization here is , , , and where . The assumptions used for the main results are verified for this model in Appendix H.1.
In this simple example, the source of the identification failure is known so that the type I test procedure in Andrews and Cheng (2012, AC12) will be used as a benchmark. Let , where is the sample minimizer of and estimates the asymptotic variance of using the sandwich formula. The test statistic is since the model is just-identified. Let be as in Section 2.3. When , the test rejects if . When , the test rejects if where is the least-favorable quantile of over . Note that under , , regardless of . Hence, the projection-based critical value in Section 2.3 is the least-favorable critical value, .222A null-imposed least-favorable critical value can also be computed by simulating the distribution of for each and all possible . This will not be used here to keep computation manageable. To summarize, this implementation of the Andrews and Cheng (2012) procedure relies on the same test statistic and critical values as in Section 2.3; the only difference is the choice of ICS statistic.
Figure H11 reports the finite sample properties of several tests and ICS procedures. The top panel shows coverage for , , , using a Wald statistic, full projection inference, AC12, and the test procedure from Section 2.3 using the normalized and unnormalized quasi-Jacobian . The Wald test suffers from severe size distortion for but is accurate for larger values of . Full projection inference is robust regardless of but conservative for . AC12 and the present procedures have coverage above the 95% nominal level, the unnormalized procedure is more conservative, AC12 is non-monotonic. To better understand these patterns, the bottom two panels provide further information on the ICS procedures. The left panel shows how often . The normalized statistic sees a large decline around when size distortion is less severe. AC12 is non-monotonic around where the Wald statistic, on which it is based, has large size distortion. The unnormalized statistic declines sharply but later than the normalized one. To further understand these differences, the right panel plots the distribution of . The solid horizontal line indicates the cutoff . The normalized statistic diverges quickly with , as identification becomes stronger. This matches the above discussion on the role of post-multiplying the quasi-Jacobian by . AC12 is more dispersed, resulting in more variable outcomes for the ICS procedure as seen in the slow decline in the left panel. AC12 increases with at a similar rate as the unnormalized statistic. Finite sample power properties of these test procedures are reported in Appendix H.2 as well as results using a larger .
Figure H12 below presents the finite-sample power properties of the test procedures used in Section 5. It shows rejection rates against local alternatives where the true . The nuisance parameter is unidentified for and weakly identified for . Each panel summarizes the finite-sample power properties for a specific level of identification strength .
For the Projection, AC12, and (un)normalized procedures have identical properties. For , AC12 does not detect identification all the time (see Figure H11, bottom left panel) which leads to small critical value and higher rejection rates than the other methods. For , the normalized test procedure relies on critical values and has comparable power to the Wald test except for . Recall that for just-identified models, the test procedure in Section 2.3 is equivalent to a standard QLR test when which can be more powerful than the Wald test in finite samples. The normalized ICS procedure is thus more powerful since it almost always picks when (see Figure H11, bottom left panel). The Wald test is not reported for where it suffers from important size distortion. AC12 has lower power for and similar power properties for . The unnormalized procedure is comparable to AC12 for and is more powerful for .
H.3 Additional Empirical Results
Confidence sets for and with a critical value: and , respectively. Using a critical value amounts to using in the baseline results (Table 3) and with the larger value for (Table H6 below).
Appendix I Asymptotic Properties of the quasi-Jacobian under Higher-Order Identification
The following provides pointwise asymptotic results for the quasi-Jacobian matrix when the model is globally but not locally identified.
Assumption I7** (Higher-Order Identification).**
Let be such that for some the moments satisfy:
[TABLE]
where . For some , there exists orthogonal projection matrices and constants where has rank and for any . These constants and projection matrices are such that for some and any :
[TABLE]
Assumption I7 implies that the model is globally identified but local identification fails so that around , the moment function is not linear but approximately polynomial of order . If then is approximately a polynomial of order in the directions spanned by . This contrasts with locally identified models where which is locally linear when is full rank and the non-linear remainder terms are negligible. Under this type of local identification failure, the parameters are consistently estimable but has non-standard limiting distribution. Full vector inference using the Anderson and Rubin (1949) statistic remains valid. As in weakly identified models, concentrating out locally identified nuisance parameters leads to more powerful and asymptotically valid inferences.
Theorem I5**.**
Suppose Assumption 1 ii-iii, 2, and I7 hold for , then:
[TABLE]
For any such that and : .
Proof of Theorem I5 for :
Pick and with for some with . Let , by Assumption I7 we have:
[TABLE]
wpa 1 for all . This implies that , wpa 1 uniformly in . Using similar arguments as in the proof of Theorem 3, we have: , for all . Using the triangular inequality we have for any such that :
[TABLE]
wpa 1. Since , this implies that:
[TABLE]
wpa 1 for each such that . In particular, we have for that: so that . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Anderson and Rubin (1949) Anderson, T. W. and H. Rubin (1949): “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations,” The Annals of Mathematical Statistics , 20, 46–63.
- 2Andrews (2017) Andrews, D. W. (2017): “Identification-robust subvector inference,” Cowles Foundation Discussion Paper .
- 3Andrews and Cheng (2012) Andrews, D. W. and X. Cheng (2012): “Estimation and Inference With Weak, Semi-Strong, and Strong Identification,” Econometrica , 80, 2153–2211.
- 4Andrews and Cheng (2013) ——— (2013): “Maximum likelihood estimation and uniform inference with sporadic identification failure,” Journal of Econometrics , 173, 36–56.
- 5Andrews and Cheng (2014) ——— (2014): “GMM Estimation and Uniform Subvector Inference with Possible Identification Failure,” Econometric Theory , 30, 287–333.
- 6Andrews et al. (2020) Andrews, D. W., X. Cheng, and P. Guggenberger (2020): “Generic results for establishing the asymptotic size of confidence sets and tests,” Journal of Econometrics , 218, 496–531.
- 7Andrews and Mikusheva (2016) Andrews, I. and A. Mikusheva (2016): “Conditional Inference With a Functional Nuisance Parameter,” Econometrica , 84, 1571–1612.
- 8Antoine and Renault (2009) Antoine, B. and E. Renault (2009): “Efficient GMM with nearly-weak instruments,” Econometrics Journal , 12, S 135–S 171.
