A technical note on divergence of the Wald statistic
Jean-Marie Dufour, Eric Renault, Victoria Zinde-Walsh

TL;DR
This paper investigates the divergence behavior of the Wald test statistic, linking it to eigenvalues of polynomial matrices and establishing the rate of divergence under certain conditions.
Contribution
It provides a mathematical connection between Wald statistic divergence and eigenvalues, and quantifies the divergence rate, enhancing understanding of Wald test limitations.
Findings
Divergence of Wald statistic linked to eigenvalues
Established divergence rate under specific conditions
Provides theoretical insights into Wald test behavior
Abstract
The Wald test statistic has been shown to diverge (Dufour et al, 2013, 2017) under some conditions. This note links the divergence to eigenvalues of a polynomial matrix and establishes the divergence rate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Advanced Mathematical Theories and Applications · Graph theory and applications
A technical note on divergence of the Wald statistic††thanks: This work was supported by the Willam Dow Chair in Political Economy (McGill
University), the Bank of Canada Research Fellowship, The Toulouse School of Economics Pierre-de-Fermat Chair of Excellence, A Guggenheim Fellowship, Conrad-Adenauer Fellowship from Alexander-von-Humboldt Foundation, the Canadian Network of Centres of Excellence program on Mathematics of Information Technology and Complex Systems, the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada and the Fonds de recherche sur la société et la culture (Québec). The authors also thank the research centres CIREQ and CIRANO for providing support and meeting space for the joint work. We thank Purevdorj Tuvaandorj for very useful comments.
Jean-Marie Dufour William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des organisations (CIRANO) and Centre interuniversitaire de recherche en économie quatative (CIREQ).
Eric Renault Brown University
Victoria Zinde-Walsh McGill University and CIREQ
Abstract
The Wald test statistic has been shown to diverge (Dufour et al, 2013, 2017) under some conditions. This note links the divergence to eigenvalues of a polynomial matrix and establishes the divergence rate.
1
The set-up and an example of divergence
Suppose that a parameter of interest satisfies
[TABLE]
where is a vector of differentiable functions;
Let be a symmetric positive definite matrix.
Assumption 1. In some open set there is a random sequence and random matrix sequence, such that as
[TABLE]
Define the usual Wald test statistic:
[TABLE]
For linear the statistic converges to a distribution, for other, e.g. polynomial restrictions the limit distribution may be not The limit results for the statistic for testing general polynomial restrictions can be found in (Dufour et al., 2013, 2017); it is also established there that under some conditions the statistic may diverge when . Below is an example of divergence.
Example 1. Restrictions for which the Wald statistic diverges.
Consider for the set of restrictions,
[TABLE]
Then the Wald statistic for assuming that the covariance matrix is identity is
[TABLE]
Suppose that the true parameter value is then holds. Suppose that the estimated parameter as is consistent and satisfies
[TABLE]
where are independent standard normals. Then the Wald statistic can be expressed as
[TABLE]
This is
[TABLE]
As the statistic diverges under
We shall assume that each is a polynomial of order in the components of Then for any each polynomial component can be written around as
[TABLE]
with some coefficients
If the value satisfies the null hypothesis, then
[TABLE]
for each
A polynomial function is eiher identically zero or non-zero a.e. with respect to the Lebesgue measure. Consider a square matrix of polynomials of variable We say that the polynomial matrix is non-singular if its determinant is a non-zero polynomial.
The rank of the matrix is the largest dimension of a square non-singular submatrix.
Unlike matrices of constants for polynomial matrices the rows may be linearly independent vectors of polynomial functions, while the matrix may have defficient rank. For example, in the matrix
[TABLE]
the two rows are given by independent vectors of polynomials, but the rank of this matrix of polynomials is one.
Assumption 2. The function is a vector of polynomial functions; the matrix of polynomials is of rank
This does not exclude the possibility of reduced rank at some particular point or on a low dimensional space.
Under the stated assumptions for the standard asymptotic distribution holds for as long as is a (numerical) matrix of rank
Each restriction can be represented as a sum
[TABLE]
where denotes the lowest degree non-zero homogeneous polynomial and has degree the degrees of all non-zero monomials in are >$$\bar{\gamma}_{l}+1. We ascribe the degree of homogeneity to a function that is identically zero.
Correspondingly to , in the matrix
[TABLE]
for each row write
[TABLE]
the degree of any non-zero homogenious polynomial in the row vector, is ; any non-zero monomial in has degree higher than Then collecting the lowest degree homogeneous polynomials in each row we have
[TABLE]
2 **The property of full rank reached at lowest degrees (FRALD)
and FRALD-T**
**Definition (FRALD). **If the matrix of lowest degree polynomials for is of full rank we say that the Full Rank at Lower Degrees (FRALD) property is satisfied for and
Examples in Dufour et al (2017) illustrate the possibilities that the FRALD property may hold at some points but not others, and that even if FRALD property does not hold for at it may hold for where is a non-degenerate numerical matrix.
Recall that the distribution of the Wald statistic is invariant with respect to non-degenerate linear transformation of the restrictions.
**Definition (FRALD-T). **There exists some numerical non-degenerate matrix such that FRALD holds for at
If FRALD-T holds for then for some FRALD holds for meaning that is a full rank matrix of polynomials.
It is shown in Dufour et al (2017) that for polynomial with a full rank matrix there always exists a non-degenerate numerical matrix such that has the property that has all the rows represented by linearly independent vectors of polynomials (each row contains non-zero homogeneous polynomials); these rows could be stacked by a permutation in an ”eschelon form”, with the degrees of the non-zero homogeneous polynomials in non-decreasing order.
The eschelon form is given by
[TABLE]
where has dimension all non-zero polynomials in have degree for , with and all the rows of are linearly independent functions. Once any that provides such a structure is found, the rank of is either and FRALD-T holds, or is less than in which case this property is violated. An algorithm to find is provided in Dufour et al (2017).
Example 2 (Example 1 continued). FRALD-T does not hold.
Take for the function g\left(\theta\right)=\left(\begin{array}[]{c}xy\\ xw\\ yz\end{array}\right). With denote with converging to Then
[TABLE]
by applying a transformation (here permutation), to the rows of this matrix we get
[TABLE]
The matrix has independent polynomial row vectors, and the rows are stacked so that the degrees of ”leading” polynomials do not decline from row to row (eschelon form). The rank of the matrix is not full in an eschelon form, no linear transformation applied to can remedy this rank defficiency. So FRALD-T does not hold for this example.
In Dufour et al (2014, 2017) the limit distribution for the Wald statistic was established for when the FRALD-T property holds.
The example 1 here of the case where the statistic was shown to diverge does not satisfy FRALD-T. The next section demonstrates the mechanism whereby the violation of the FRALD-T property leads to divergence of the statistic.
3 Divergence of the Wald statistic where FRALD-T does not hold
Assume that for for which the null is satisfied, the FRALD-T property does not hold. Without loss of generality we may assume that is such that the eschelon form applies to (so that in FRALD-T and in is identity).
Denote by the Gaussian limit Define With the scaling we get
[TABLE]
[TABLE]
where has rank when FRALD-T does not hold. Consequently, inverting the consistent estimator for
[TABLE]
will lead to an explosion as
We next examine the matrix and its limit eigenvalues which provide the key ingredient to prove the divergence of the Wald statistic.
Denote by the eigenvalues of the matrix arranged in decreasing order: and denote by
[TABLE]
the diagonal matrix of these eigenvalues.
We prove several auxilliary results about eigenvalues of non-random polynomial matrices (proofs are in the next section).
Start with where is a non-zero matrix of polynomial functions and define the characteristic polynomial, .
The next proposition describes a polynomial representation for the coefficients of as a polynomial in . Denote by the set of all real symmetric positive-definite matrices.
Proposition 1. Let and be a non-zero matrix of polynomial functions in such that rank a.e., , and is the characteristic polynomial. Then can be written as
[TABLE]
where the coefficients have the following polynomial expansions
[TABLE]
where is a homogeneous in polynomial of degree and is a sum of polynomials with any non-zero mononomials of degree strictly greater than Further, if then for almost all
Example 3 Restrictions of Example 1 but with a covariance matrix for which
In the example 1 we had divergence at the rate when the matrix was identity. The characteristic polynomial for the same restrictions, of example 1 with a covarince matrix possibly different from has as the (here for coefficient the determinant of With the determinant of provides
[TABLE]
and we note that the lowest degree monomial is It can be verified that for these restrictions and we get (so that there can be no for which the degree could be smaller) and by Proposition 1 for almost every However, below we provide for which is such that * Consider*
[TABLE]
For this we get
[TABLE]
with the lowest degree of monomial
Since the coefficients of a polynomial represent symmetric polynomials in the roots (e.g., Horn and Johnson, 1985, Section 1.2), elementary symmetric polynomials in the eigenvalues can be expressed as polynomial functions in Denote by the set of all combinations of integers out of Denote by the elementary symmetric polynomial in
[TABLE]
**Corollary to Proposition 1. **For the eigenvalues that are the solutions of the characteristic polynomial, we have that
[TABLE]
*and thus the representation applies. *
In the next proposition we apply scaling to the argument by considering and exploit the polynomial terms from with lowest degree of homogeneity in to establish the rates for the eigenvalues of a scaled polynomial matrix. Recall that from the convergence result the matrix scaling is associated with We show that when rank of is less than (in violation of the FRALD-T condition) some eigenvalues will be converging to zero and additional scaling can be applied to have the eigenvalues converge to continuous limit functions. This additional scaling will provide the divergence rate.
Proposition 2. Under the conditions of Proposition 1 consider the scaled matrix for
[TABLE]
and its eigenvalues in descending order. Then for some non-negative integers that satisfy
[TABLE]
we have that
[TABLE]
where are continuous a.e. non-zero functions.
Thus we see that for eigenvalues beyond additional non-trivial scaling provides convergence to a continuous a.e. non-zero function.
The next proposition shows that convergence with these rates to a continuous (but now in some exceptional cases possibly zero) function is preserved when is replaced with a sequence of matrices from such that
Proposition 3. Under the conditions of Proposition 2 consider a sequence such that . Then
(a) if for we have that
[TABLE]
(b) if for some it holds that for and then
[TABLE]
Recall that case (a) will hold for almost all by Proposition 1. Example 3 illustrates part (b) of Proposition 2: there so but and thus scaled up by goes to zero. It is possible that for itself in that case converges to a non-zero limit. Alternatively, if has for almost all but converges to sufficiently fast the rate could still be as high as However, in case (b) to get a precise rate we also need to consider the convergence rate for
The next proposition applies the deterministic properties to provide limits for eigenvalues of the random matrix in the diagonal eigenvalue matrix of . Without loss of generality consider
Proposition 4. Suppose that assumptions 1,2 hold at Then there is a sequence of integers which depends on and such that
[TABLE]
with all continuous non-negative functions a.e..
We see that if FRALD-T were not violated, no additional scaling would be required, but once it is violated the extra scaling is captured by for that determines the rate of explosion of the Wald statistic. The Theorem below shows this.
Theorem. Under the conditions of Proposition 4 if FRALD-T property does not hold, i.e. then we have for that
[TABLE]
where a continuous positive a.e. function.
We thus see that if FRALD-T is violated the rate of the exlosion is at least (as in example 1 here), but could be stronger even with the same restrictions (as could be in example 3).
4 Proofs
Proof of Proposition 1.
First, consider the polynomial expansion for , given e.g. in Harville (2008, Corollary 13,7,4). Denote by the set of all combinations of integers out of denote for by a minor of the matrix obtained by striking out all the rows and columns numbered Then
[TABLE]
Since all the components of the matrix are polynomials in it follows that the determinants of the minors are also polynomials in Then, given denote by the homogeneous polynomial in of the lowest degree, denoted to obtain the polynomial expansion of the Proposition.
Next, note that is also a polynomial function in the components of the matrix By varying over we can find the minimum possible denoted Thus there is some matrix, such that for we have that This implies that in there is a homogeneous polynomial of degree in that is non-zero, thus has at least one non-zero coefficient on a monomial term of degree Since this coefficient is a polynomial function of the components of considering this polynomial over the corresponding components of all we note that it is non-zero a.e.. This implies that for almost all
Proof of Proposition 2.
For every consider in place of and in place of in Proposition 1. Then for the corresponding characteristic polynomial, the expansion similar to will provide coeffficients
[TABLE]
Note that we have that that is a non-zero constant for and zero for Therefore the coefficients can be represented as
[TABLE]
where is zero for But for there is some such that where and is a (positive a.e.) homogeneous polynomial in of degree So altogether we can write
[TABLE]
where is a polynomial that can contain non-zero monomials only of degree strictly higher than Then apply the representation to the corresponding coefficients to write for every
[TABLE]
The proof is by induction on
For consider the largest eigenvalue Note that so that is always zero. Since is the sum of all eigenvalues we have by replacing all the eigenvalues by the largest, that
[TABLE]
then since a.e. the limit of (with is positive a.e..
Suppose that for all for converge to continuous positive a.e. functions.
Then by replacing in the symmetric polynomial all the terms by the largest, and multipying by the rate, we can write that
[TABLE]
Since the expression in the last line has a limit that is non-zero a.e., so does the expression in the second line; by the induction hypothesis converges to a continuous positive a.e. function. Thus for the function converges to a continuous positive a.e. function.
From the derivation it follows that for and generally for
Proof of Proposition 3.
Under the condition in (a) the degrees of homogeneity for the coefficients of the characteristic polynomial of and for the corresponding coefficient in has to be the same for large enough and thus by the proof of Proposition 2 we conclude that have the same positive a.e. limit as
Under the condition in (b) we can write
[TABLE]
where converges to a function that is positve a.e., but by the condition for the left-hand side converges to zero. Thus converges to zero and so does for any
Proof of Proposition 4.
Consider the scaling matrix and the scaled matrix By Assumption 1 since that is absolutely continuous by Proposition 2 the eigenvalues of converge in distribution to continuous functions in some of which are non-zero a.e.. Additionally, for any sequence of we can select a subsequence that converges a.s. to and by Proposition 3 we have the same result for the limits of eigenvalues of
Proof of the Theorem.
Consider now the matrix as defined in Proposition 4. The eigenvalues of the scaled matrix, denoted by Proposition 4 converge in distribution
[TABLE]
Then rewrite the Wald statistic as
[TABLE]
Since are the eigenvalues of the non-negative definite matrix for any vector we have
[TABLE]
thus
[TABLE]
We have that
[TABLE]
with all components of the vector function non-zero a.e. for absolutely continuous Then
[TABLE]
Define
[TABLE]
then
[TABLE]
By Proposition 4, continuity of the eigenvalue function, and of the maximum of continuous functions
[TABLE]
with the limit functions non-zero a.e.. Also,
[TABLE]
which is a piece-wise polynomial continuous function. The ratio
[TABLE]
exists and is non-zero a.e. and
[TABLE]
When the FRALD-T condition is violated and the Wald statistic diverges to
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Dufour, J.-M., Renault, E. and V. Zinde-Walsh, 2013, Wald tests when restrictions are locally singular, working paper, Ar Xi V
- 2[2] Dufour, J.-M., Renault, E. and V. Zinde-Walsh, 2017, Wald tests when restrictions are locally singular, working paper, https://monde.cirano.qc.ca/~dufourj/Web_Site/Dufour_Renault_Zinde Walsh_2012_Wald Tests Locally Singular Restrictions_W.pdf
- 3[3] Harville, D.A., 2008, Matrix Algebra from a Statistician’s Perspective, Springer-Verlag, New York
- 4[4] Horn, R. G. and Johnson, C. A. (1985), Matrix Analysis, Cambridge University Press, Cambridge, U.K.
