New goodness-of-fit diagnostics for conditional discrete response models
Igor Kheifets, Carlos Velasco

TL;DR
This paper introduces new goodness-of-fit tests for discrete response models that improve power by avoiding randomization, applicable to static and dynamic ordered choice models, with theoretical analysis and empirical validation.
Contribution
It develops an alternative transformation for discrete data that enhances test power without randomization, extending specification testing to a broader class of models.
Findings
New transformation improves test power over traditional jittered methods.
Asymptotic properties of tests are analytically derived.
Bootstrap method effectively approximates critical values.
Abstract
This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an…
| Scenario | Null and Alternative |
|---|---|
| Size 1 | static probit |
| Size 2 | static logit |
| Power 1 | static probit vs static logit |
| Power 2 | static probit vs dynamic probit |
| Power 3 | static probit vs dynamic logit |
| I-static | I-dynamic | II-static | II-dynamic | III-static | III-dynamic | IV-static | IV-dynamic | |
|---|---|---|---|---|---|---|---|---|
| — | — | — | — | |||||
| I-static | I-dynamic | II-static | II-dynamic | III-static | III-dynamic | IV-static | IV-dynamic | |
|---|---|---|---|---|---|---|---|---|
| — | — | — | — | |||||
| static probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| static probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 1 static probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 2 static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 1 static probit vs static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 2 static probit vs dynamic probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 3 static probit vs dynamic logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 1 static probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 2 static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 1 static probit vs static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 2 static probit vs dynamic probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 3 static probit vs dynamic logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 1 static probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 2 static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 1 static probit vs static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 2 static probit vs dynamic probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 3 static probit vs dynamic logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 1 static probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Size 2 static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 1 static probit vs static logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 2 static probit vs dynamic probit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| Power 3 static probit vs dynamic logit | |||||||||||
| Model I | |||||||||||
| Model II | |||||||||||
| Model III | |||||||||||
| Model IV | |||||||||||
| The value of | |||
|---|---|---|---|
| The value of | |||
| The value of | |||
| The value of | |||
| The value of | |||
| The value of | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
New goodness-of-fit diagnostics for conditional discrete response
models
Igor Kheifets and Carlos Velasco ITAM, Mexico. Email: [email protected] of Economics, Universidad Carlos III de Madrid. Email: [email protected]
Abstract
This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of-fit tests based on this new transformation and explore the properties in finite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve.
Keywords: Specification tests, count data, dynamic discrete choice models, conditional probability integral transform.
JEL classification: C12, C22, C52.
1 INTRODUCTION
Many statistical models specify the conditional distribution of a discrete response variable given some explanatory variables, including the description of binary, multinomial, ordered choice and count data. In this paper we analyze goodness-of-fit tests for both static models with covariates as well as dynamic ordered choice and count data models, where the conditioning information set may also include past information on the discrete variable and a set of (contemporaneous) explanatory variables which frequently appear in the social sciences, see Kedem and Fokianos (2002) and Greene and Hensher (2010). For example, dynamic models are popular in macroeconomic applications, see for instance Hamilton and Jordá (2002), Dolado and Maria-Dolores (2002) and Basu and de Jong (2007) for modeling central banks decisions or Kauppi and Saikkonen (2008) and Startz (2008) for predicting US recessions; in finance, see e.g. Rydberg and Shephard (2003) for modeling the size of asset price movements and Fokianos et al. (2009) for the number of transactions per minute of a particular stock.
Suppose we observe the random variables and consider the information sets for each period . We are interested in testing the null hypothesis that the distribution of conditional on is in the parametric family , i.e.
[TABLE]
where is the parameter space, while the alternative hypothesis () for the omnibus test would be the negation of .
We consider a class of discrete conditional distributions defined on , for integer or on such that for all it holds that , for all and . This setup includes numerous models that have been used extensively in applied work both for dynamic and for iid data, here we describe briefly two of them.
Example 1** (Dynamic multinomial ordered choice model).**
The discrete responses are assumed to be generated by the rule
[TABLE]
where is a continuous latent variable and are threshold parameters that define intervals in . In a simple model, e.g. Basu and de Jong (2007), the latent variable is determined through the linear equation
[TABLE]
where is a vector of stationary exogenous regressors, a vector of regression parameters, is the shock in each period, and could be replaced by any function of the past for some finite The cdf of is going to determine the class of multinomial model, i.e. ordered multinomial probit (if is standard normal) or logit (if is logistic), since is defined at once from
[TABLE]
with and \tau_{K}=\infty\and
Example 2** (Poisson Model).**
The variate is defined on the counts which are assumed to follow a conditional Poisson distribution
[TABLE]
where the conditional mean can depend on covariates through an exponential link as or on previous observations through an identity link as e.g. Fokianos et al. (2009), or through the logarithmic canonical link as log where are scaled and centered errors, e.g. Davis et al. (2003).
Despite that a correct specification is key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events, empirical researchers typically do not perform goodness of fit testing of such models as they would do in a continuous case. In general, there are only a few specification tests available for discrete data, see Mora and Moro-Egido (2007). Two of them, the test of the Generalized Linear Model (GLM) of Stute and Zhu (2002) and the conditional Kolmogorov test of Andrews (1997), based on the specification of the conditional mean for binary data, can be adapted for this purpose and we discuss this possibility and compare it to our approach in Section 6. A related test to Andrews derived for time series by Corradi and Swanson (2006) could be adapted also for discrete data, but this is testing a different null hypothesis concerning a distribution given a finite conditioning set not characterizing the complete dynamics of the process. There are also tests designed specifically for Poisson models (see e.g. Neumann 2011; Fokianos and Neumann, 2013).
In what follows we propose conditional, dynamic discrete analogs of the Kolmogorov-Smirnov goodness of fit measure that can exploit different restrictions derived from the martingale difference property of a particular transformation of the data under the null hypothesis. This property is derived from the specification of a complete dynamic model given the information set generated by all the past observations of the discrete response and other explanatory variables and is used to build the asymptotic theory for our tests. Under i.i.d. assumptions this martingale difference property leads to an exact independence of the transformation sequence under the null and a much simpler parallel asymptotic theory.
When the fitted distribution is continuous, the relative distribution of compared to defined as the cdf of the Rosenblatt’s (1952) transforms, also called conditional Probability Integral Transforms (PIT),
[TABLE]
is standard uniform and are distributed as independent uniform random variables under . This serves as a basis for several specification tests of , see e.g. Bai (2003) and Kheifets (2015) for dynamic models and Delgado and Stute (2008) for independent and identical distributed (iid) data. However Rosenblatt transformation is not appropriate for discrete support random variables, producing non-iid pseudo residuals even under the null of correct specification. To solve the limitations of PIT-based testing techniques for discrete data, several alternative transforms have been proposed, see Jung, Kukuk and Liesenfeld (2006), Czado, Gneiting and Held (2009) and references therein. An easy and popular way is to randomize, i.e. to interpolate the discrete values of with independent noise in , recent references include Kheifets and Velasco (2013) and Lee (2014). Unfortunately the additional simulated noise affects the power of the tests and may lead to different conclusions depending on the simulation outcome.
In this paper instead, we consider a nonrandomized transform for ,
[TABLE]
where . This transform, conditional on data, is nonrandomized in the sense that it does not depend on extra sources of randomness, as opposed to interpolation transforms discussed in the next section. The unconditional version of this transform appears in Handcock and Morris (1999) and more recently in Czado, Gneiting and Held (2009) where it is used for calibration, but no formal tests are proposed there. This transformation can also be seem as a particular case of the multilinear extension as defined in Genest, Nešlehová and Rémillard (2014). As we show below, for every , constitute a martingale difference sequence (MDS) with respect to under and can be used for testing as loses this property when the model is misspecified. For instance, we can compute the pseudo empirical relative distribution of compared to
[TABLE]
which can be contrasted with the uniform cdf using the following empirical process
[TABLE]
which converges weakly to a Gaussian process. In addition, in order to control dynamics in , we can compare the joint pseudo empirical cdf with the uniform on a square using the biparameter process
[TABLE]
where . To obtain feasible tests we need to consider norms of for . We use the Cramer-von Mises for some absolute continuous measure in , or Kolmogorov-Smirnov norms.
When the parameter is unknown under the null, we use an estimate and account for the parameter estimation effect in the -value computation with a parametric bootstrap method. It might be possible also to derive, e.g. martingale, distribution-free transforms, but since they typically need to be programmed on a case by case basis for each model, so can be impractical, and are beyond the scope of this paper. As far as we know, our proposal is the first formal specification test of ordered discrete choice models which accounts properly for parameter uncertainty and is based on a nonrandomized transform, which makes it attractive in terms of power against a wide set of alternative hypotheses.
The rest of the paper is organized as follows. In the next section, we describe different alternatives to the PIT. In Sections 3 and 4, we provide the main asymptotic properties of the nonrandomized transforms and of the resulting univariate and bivariate empirical processes using martingale theory. In particular, we establish weak limits under fixed and local alternatives accounting for parameter estimation effect. Section 5 discusses the implementation of new tests with a simple bootstrap algorithm. Section 6 provides a small simulation exercise and an application exploring the properties of specification tests based on both randomized and non randomized transformations. Then we conclude. All proofs are contained in the Appendix.
2 ALTERNATIVES TO PIT FOR DISCRETE DATA
In order to further motivate the nonrandomized transform defined in (1), we introduce the randomized PIT,
[TABLE]
where are independent standard uniform random variables, and independent of . Alternatively, can be obtained by applying the standard continuous PIT to the continuous random variable , where are iid with any continuous cdf on . Indeed, we can construct the cdf of ,
[TABLE]
where is the floor function, i.e. the maximum integer not exceeding , and find that
[TABLE]
for and any choice of , see Kheifets and Velasco (2013). Note that the cdf of conditional on and coincide. Under , are iid variables as under any continuous distribution specification, while and are not independent nor . Using the typical discrepancy measures, the empirical cdf of , estimated using the randomized transform ,
[TABLE]
can be compared to the uniform cdf. Kheifets and Velasco (2013) then test using empirical process based on the randomized transform
[TABLE]
We can also consider reducing the dependence on a particular outcome of the noise in (3) and in the randomized transform by taking averages over replications of , conditional on the original data, similar to “average-jittering” of Machado and Santos Silva (2005). Suppose that for each we have independent sequences of uniform noises , , which generate according to (3). Define the M-random transform ,
[TABLE]
which takes values on the set and has mean under . Then the cdf of is estimated by
[TABLE]
Note that with we are back to , and therefore, we can generalize to
[TABLE]
In order to propose specification tests, following Handcock and Morris (1999), we define the discrete relative distribution of compared to as the cdf of . Under , the discrete relative distribution is the uniform . As we show in the next section, three consistent estimators of the discrete relative distribution of compared to can be ordered in terms of efficiency in the following way: (the most efficient), and . This ordering is determined by the amount of noise introduced in the definitions of the transforms: i.e. in nonrandomized, -randomized and (-)randomized transforms. The nonrandomized transform can be equivalently obtained by integrating out the extra noise in the randomized transform or by taking the number of replications to infinity, thus completely removing the noise from the estimate of the discrete relative distribution and other functionals of the transforms. The efficiency of the nonrandomized transform translates into the increased power of the specification tests based on this transform, whose properties we study next.
3 PROPERTIES OF EMPIRICAL PROCESSES BASED ON THE NONRANDOMIZED
TRANSFORM
As shown in the next lemma, the building blocks of , constitute a martingale difference sequence (MDS) with respect to , and therefore is an unbiased and consistent estimate of the uniform cdf under the null, a reasonable basis for developing tests of . Moreover, the MDS property will allow us to establish the asymptotic properties of our test without imposing any additional restrictions. Let for
[TABLE]
where , with being the conditional quantile function and .
Lemma 1**.**
Under , is a martingale difference sequence with respect to , i.e.
[TABLE]
with conditional covariance
[TABLE]
Note that are not necessarily independent across despite the fact that by the martingale difference property, and are serially uncorrelated for all and all see the Appendix. On the other hand, the are (conditionally) heteroskedastic, therefore the variance of is model and parameter dependent, but its distribution can be simulated conditional on exogenous information in
Let , then since ,
[TABLE]
i.e. the covariance and variance of are not larger than those of the randomized transformation-based process or its weak limit, the Brownian sheet, see Corollary 4 in Kheifets and Velasco (2013).
Due to Lemma 1, under and the natural empirical process for performing tests on is then . This process, being based on a nonrandomized transform, does not involve the extra noise that appears in the randomized transform based empirical process for testing , proposed by Kheifets and Velasco (2013), or in its modification based on the -randomized transform. The next lemma is the key to understand the improvement of the -randomized over the randomized and of the nonrandomized, advocated in this paper, over the -randomized transform approaches.
Lemma 2**.**
Suppose that the uniform law of large numbers holds for and . Independently of whether holds or not, and consistently and uniformly in estimate the relative distribution, i.e. the cdf of . is more efficient, but the difference in efficiency goes to [math] as . In particular, under ,
[TABLE]
From Lemma 2, it follows that has the smallest variance, the variance of is a weighted sum of those of and , see also Equation (5) in Machado and Santos Silva (2005). Other advantages of over , are 1) computational, as there is no need to simulate paths of transformations and 2) theoretical, since the weak convergence is easier to prove for processes which are piece-wise linear in parameters. Therefore we concentrate on studying the properties of tests based on the nonrandomized transform, for which we introduce the following assumption.
Assumption 1**.**
for all . Moreover, there exists a finite function , such that uniformly in .
This assumption implicitly restrict dynamics such that a uniform law of large numbers (LLN) holds for the averaged conditional covariance function. In the case of stationary and ergodic data, . Sufficient conditions for the stationarity and ergodicity of dynamic multinomial ordered choice models are given in Basu and de Jong (2007) and for autoregressive Poisson are given in Davis et al. (2003), Fokianos et al. (2009) and Doukhan et al. (2012). Then it is possible to show the uniformity of the convergence from a point-wise result, since the summands are continuous, piece-wise polynomials in and . As an illustration, in Section 8.5 in Appendix we discuss the assumptions for the Poisson model.
The next result describes the asymptotic distribution of under the null hypothesis. Let denote weak convergence in , see e.g. van der Vaart and Wellner (1996). In fact, our empirical processes are continuous, which simplifies tightness verification. Let .
Lemma 3**.**
Suppose Assumption 1 holds. Under ,
[TABLE]
where is a Gaussian process in with zero mean and covariance function .
The asymptotic distribution of is model and parameter dependent, and the practical implementation of tests when is unknown is discussed in Section 3.2 after presenting a general class of local alternatives to the null of correct specification of the conditional distribution.
3.1 Local Alternatives
We next discuss the asymptotic properties of the empirical process under a class of alternative hypothesis, that will lead to consistency of the specification tests based on for a wide class of alternatives. We consider the following class of local alternatives to
[TABLE]
where
[TABLE]
for some and for all , . When then nests
Following Kheifets and Velasco (2013), for any discrete distributions and in , with probability functions and , define
[TABLE]
Note, that and if and only if . Under any ,
[TABLE]
The next assumption guarantees that a LLN can be applied to the empirical discrepancy between and
Assumption 2**.**
Under , there exists a finite function , such that uniformly in .
Then the following lemma shows that the departure of in the direction of introduces a drift in the asymptotic distribution of that will render consistency of hypothesis tests based on functionals of .
Lemma 4**.**
Suppose Assumptions 1-2 hold. Under ,
[TABLE]
where is as in Lemma 3.
3.2 Parameter Estimation Effect
In practice, tests based on are unfeasible since is unknown, and has to be estimated by say. We assume that we have available an estimate so that under
[TABLE]
and define the process with estimated parameters
[TABLE]
We next analyze the consequences of replacing \theta_{0}\by in .
Let be Euclidean norm, i.e. for matrix , , where is a transpose of . For is an open ball in with the center at point and radius . For a cdf in define
[TABLE]
where and . We need the following assumptions to analyze the asymptotic properties of .
Assumption 3** (Parametric family).**
- (A)
The parameter space is a compact set in a finite-dimensional Euclidean space, . 2. (B)
There exists , such that , for all , , and . 3. (C)
is differentiable with respect to and under
4. (D)
Under , there exists a finite .
Conditions (A)-(C) about the parametric family of distribution are standard, see e.g. Bai (2003, Assumptions A1-A2). For dynamic ordered choice and Poisson models the differentiability of the conditional distribution with respect to the parameter is equivalent to the differentiability of the link function. Part (D) guarantees a nice limit behaviour of the average generalized derivative of . Conditions for no effect of information truncation can be provided in a similar way to Bai (2003, Assumption A4).
The following lemma provides an expansion of the empirical process with estimated parameters as the sum of the process with known parameters and a random drift describing parameter estimation.
Lemma 5**.**
Suppose Assumptions 1-3 hold and . Under ,
[TABLE]
uniformly in .
Then, continuous functionals of no longer converge to those of under , but the estimation effect also has to be taken into account using the following assumption. Let be a normal vector with zero mean and covariance matrix
Assumption 4** (Parameter estimation).**
Under , the estimator admits the asymptotic linear expansion
[TABLE]
where is a vector and the summands constitute a martingale difference sequence with respect to , such that
(A)
and
(B)
Lindeberg condition holds.
(C)
There exists a finite function , such that uniformly in .
In particular, under , , the estimate is centered and converges in distribution to .
Assumption 4(A) and 4(B) hold for the MLE of many popular discrete models, including dynamic probit and logit and general discrete choice models. As an example consider estimates , which are asymptotically equivalent to the (conditional) maximum likelihood estimates, i.e.,
[TABLE]
where is the score function and is a symmetric positive definite matrix given by the limit of the Hessian,
[TABLE]
Under , . Then equation (5) holds with and
.
We can derive the covariance matrix between the process and and obtain joint convergence results, so under
[TABLE]
where the covariance function between and is .
We can state now the result on the asymptotic distribution of the empirical process under local alternatives, whose drift is different with respect to the case without estimated parameters.
Theorem 1**.**
Suppose Assumptions 1-4 hold. Under
[TABLE]
where is a Gaussian process with zero mean and variance function .
4 EMPIRICAL PROCESSES FOR DYNAMIC SPECIFICATION
Test statistics based on , and verify that the conditional distribution of is right on average across all possible , so these tests might not capture all sources of misspecification. This issue is raised in Corradi and Swanson (2006), Delgado and Stute (2008) and Kheifets (2015) in relation to testing continuous distributions. However, it is not possible to develop specification tests conditioned on infinite dimensional values of . Instead of truncating or restricting the class of models, we consider , a biparameter analog of to control the possible dynamic misspecification. From Lemma 1, since under , is a MDS, is centered around zero, and moreover
[TABLE]
This motivates us to develop tests based on defined in (2). This process also has zero mean under the null and identifies not only departures from the null derived from deviations of the unconditional expectation of from but also from a possible failure of the martingale property, so that and would become correlated. This idea is similar to that exploited in Kheifets’ (2015) in the context of conditional distribution testing for continuous distributions, where different methods of checking the independence property of the PIT are proposed. Alternative statistics exploiting the lack of correlations with any other lag could be proposed, but we expect that low lags are typically more useful for detecting general forms of misspecification.
One could also consider a biparameter analog of , i.e. for some
[TABLE]
where . In particular, a bivariate analog of , , is introduced in Kheifets and Velasco (2013). Tests based on and involve randomized transforms and therefore suffer from power loss compared to tests based on the nonrandomized transform.
Note, that is a martingale. This observation will allow us to derive weak convergence of by employing limiting theorems for MDS. Properties of were established in Kheifets and Velasco (2013) and could be extended to . Here we discuss the properties of when we estimate
In practice we use the process
[TABLE]
where we can write under
[TABLE]
uniformly in , where
and the asymptotic covariance function is . To study the asymptotic properties of the biparameter process we introduce the next assumption, which extends Assumption 2.
Assumption 5**.**
Under , there exist finite functions and , such that uniformly in
(A)
.
(B)
.
Note that the second terms in the definitions of and correspond to and respectively, the equivalent for the single parameter process , but the first ones are new. To state the next result, we need to assume existence of probabilistic limits of several random functions. For the sake of presentation, we defer precise statements to the Appendix, see Assumption A.
Theorem 2**.**
Suppose that in addition to the conditions of Theorem 1, Assumption 5 and Assumption A from the Appendix hold. Under ,
[TABLE]
where is a Gaussian process in with mean zero and covariance function defined in the Appendix. Under , if parameters are estimated,
[TABLE]
where is a Gaussian process with zero mean and variance function .
When is different from such that is non-zero, the test based on has nontrivial power in the direction of . In contrast to the univariate case with , the first term in the definition of contains correlation with the past information and can therefore capture dynamic misspecification when this induces in such a correlation, even if the unconditional expectation of , which appears in the second term , is zero. This fact is crucial if misspecification occurs in the dynamics and not only in the link function or other static aspects of the model.
5 BOOTSTRAP TESTS
To test we consider Cramer-von Mises, Kolmogorov-Smirnov or any other continuous functionals of , , . Then consistency properties of specification tests based on can be derived using the discussion in the previous sections by applying the continuous mapping theorem, so we omit the proof of the following result.
Theorem 3**.**
Suppose that conditions of Theorem 2 hold. Under ,
[TABLE]
Since the asymptotic distributions of are model dependent, and those of further depend on the estimation effect, we need to resort to bootstrap methods to implement our tests in practice. In the literature, there are several resampling methods suitable for dependent data, but since under the parametric conditional distribution is fully specified, we apply a conditional parametric bootstrap algorithm that only requires to make draws from to mimic the null distribution of the test statistics. For a discussion of the parametric bootstrap see Stute et al. (1993) and Andrews (1997), which can be adapted to the complications with information truncation and initialization arising in the dynamic case using the discussion in Bai (2003).
To estimate the true quantiles of the null asymptotic distribution of the test statistics, given by some continuous functional applied to with , we implement the following steps.
Estimate the model with data , , get parameter estimator and compute test statistics . 2. 2.
Simulate with recursively for , where the bootstrap information set is . 3. 3.
Estimate the model with simulated data , get using the same method as for get bootstrapped test statistics . 4. 4.
Repeat 2-3 times, compute the percentiles of the empirical distribution of the bootstrapped test statistics. 5. 5.
Reject if is greater than the th percentile of the empirical distribution of the bootstrapped test statistics denoted by .
To analyze the properties of our parametric bootstrap, we need to assume that the same conditions on the estimation method hold for both for original and resampled data. More formally, we have
Assumption 6**.**
(A)
The conditional distribution of conditional on coincides with the conditional distribution of conditional on .
(B)
Suppose that the sample is generated by , for some nonrandom sequence converging to , i.e. we have a triangular array of random variables with element generated by , where
. Then the estimator of admits an asymptotic linear expansion as in Assumption 4. Moreover, assume that under the alternative there exists some so that
This assumption insures that by simulating from the conditional distribution we obtain the correct joint distribution of and in parallel to those required in Theorems 1-2. Assumption 6 (A) says that and future are independent conditionally on past information, i.e. that there is no direct feedback effect. For example, in a latent variable form of the ordered probit model, this assumption translates to strict exogeneity, i.e. that innovations are independent of future . Dependence between and future is still allowed through serial dependence in and . Assumption 6 (B) is similar to Condition (5.5) in Burke et al. (1979), Assumption (A1) in Stute et al. (1993) and Assumption E2 in Andrews (1997), and introduces a triangular array version of the expansion and central limit theorem for parameter estimates, see also the discussion in Section 4.1 in Andrews (1997).
We obtain the following result.
Theorem 4**.**
Suppose that in addition to conditions of Theorem 2, Assumption 6 holds. Under as
[TABLE]
in probability, so , and therefore, under . Suppose also that the conditions of Theorem 2 hold for any . Under as .
This theorem shows that the bootstrap test statistic has the same limit distribution as the original one under local alternatives, so that under the null we get the right asymptotic size using bootstrap estimated critical values and that under local alternatives we get non trivial power when the drifts of the stochastic processes and are non negligible. Similarly, under fixed alternatives we are able to get a bootstrap consistent test when the asymptotic test is consistent itself, i.e. if diverges asymptotically.
6 APPLICATION AND SIMULATIONS
In this section we use a Monte Carlo simulation exercise to investigate the finite sample properties of the tests proposed in this paper. We take as reference the dynamic ordered discrete choice models investigated in Basu and de Jong (2007) for the modeling of the monetary policy conducted by the Federal Reserve (FED). The dependent variable uses the following codification of the changes in the reference interest rate in US, the federal funds rate ,
[TABLE]
Data is monthly and spans January 1990 to December 2006, leading to complete observations. The explanatory variables that Basu and de Jong (2007) used to explain the decisions of the FED on are the current value and 4 lags of inflation , the current value and a lag of four different measures of output gap and a series of dummies that describe the decision of the FED in the previous period, Instead of these four dummies, we implement an AR, ’dynamic’ version with one lag of the discrete as explanatory variable (and a version without lags that we refer to as ’static’ to serve as a benchmark to the inclusion of lagged endogenous variables in . We consider both the Logit and Probit versions of the models. We fit four versions of the basic model based on different definitions of the output gap and conditional on the series of inflation and output gap and on the parameter estimates obtained, we simulate series and conduct our tests on these (see Monte Carlo scenarios in Table 1).
The four choices of output gap lead to Models I-IV. The output gap is the percentage deviation of the actual from the potential output, which is interpolated to obtain a series of monthly frequency by replicating the GDP observation for any quarter to all the months in that quarter. Then two different measures of potential output are used: the potential output series provided by the Congressional Budget Office and a potential output series constructed in a real-time setting using the HP filter, leading to Models I and II. Apart from output gap, other measures of economic activity are used, such as unemployment rate and capacity utilization, leading to Models III and IV. Data sources are described in Basu and de Jong (2007).
We compare the performance of our tests with an alternative test which is also omnibus and does not require smoothing (and choice of smoothing parameters). Two general approaches can be adapted to our setup: the test of the Generalized Linear Model (GLM) of Stute and Zhu (2002) and the Conditional Kolmogorov test of Andrews (1997), as discussed in Mora and Moro-Egido (2007). The first one is a test based on a marked empirical process for testing the null , where is a parametric link function and are finite dimensional parameters. In the cases where takes only two values , the conditional mean coincides with the conditional probability of and the null is similar to our if we were considering an i.i.d setup. To test define the process
[TABLE]
The second test by Andrews is obtained by substituting with (where is a real vector of dimension of ) in , but since it always underperforms according to simulations of Mora and Moro-Egido (2007), it is not considered here. If takes values , Mora and Moro-Egido (2007) substitute testing by tests of the hypotheses , with corresponding processes , where and , then the resulting pooled test statistics are
[TABLE]
and
[TABLE]
which we call the CvM and KS tests respectively. To apply these tests to our model, let and and take the corresponding link functions.
We analyze tests based on , , and , , and . In all cases we use Kolmogorov-Smirnov (KS) and Cramer-von Mises (CvM) measures. We only consider feasible bootstrap versions of tests based on , , etc, where we replace by root-T\consistent estimates , the ML estimator in our case. We are not aware of any theoretical results for bootstrap assisted tests based on in our setup, although Mora and Moro-Egido (2007) provide some simulations.
Parameter estimates for real data are reported in Tables 2 and 3. The main question is whether the static Probit or Logit models are appropriate for changes in the interest rates, and we check this with our tests. The -values in Tables 4 and 5 say that all these models are rejected even at the 1% significance level by biparameter nonrandomized transform based tests. Note that single parameter static tests (e.g. , ) cannot reject any proposed model with the sole exception of which rejects at 5% Model II with Cramer – von Misses test statistics.
To study the reliability of these results we conduct a Monte Carlo experiment using the estimated models with the real data as data generating processes and obtain the simulations for the discrete response conditional on the covariates time series. In Tables 6 and 7 we provide the empirical size and power results of our tests across simulations for sample size and static Probit and Logit and output gap choices (Models I to IV). To speed up the simulation procedure, we use the warp bootstrap algorithm of Giacomini, Politis and White (2013). We see that all bootstrap tests provide reasonable size accuracy, tests based on single parameter empirical processes underrejecting slightly, while ones based on bivariate processes tend to overreject moderately. Kolmogorov-Smirnov and Cramer-von Mises tests perform similarly in all cases, and the choice of the output gap series does not make large differences either, nor does the introduction of lagged endogenous (discrete) variables in the information set.
The power of the tests for the static Probit model is analyzed against three different alternatives: static Logit, dynamic Probit and dynamic Logit. We see that the tests without randomization, and always perform better than random continuous processes and , which in turn dominate and , thus confirming our theoretical findings. When we compare Probit and Logit specifications while letting the dynamic aspect of the model be well specified, static in both cases, we observe that with this sample size and these specifications, it is almost impossible to distinguish Probit from Logit models. The power against a dynamic Probit and Logit alternatives is very high. Since the nature of misspecification is dynamic, once again bivariate processes should have more power compared to single parameter counterparts, as it is confirmed in our simulation results. It can also be observed that for these alternatives, the Cramer-von Mises criterium provides more power than Kolmogorov-Smirnov tests. As for alternative tests based on , they have power comparable to , sometimes slightly better, and are always outperformed by any bivariate test. This is not surprising, since has more structure, i.e. it assumes a single-index model for covariates, but averages across points, thus suffering the same problems as other single parameter tests considered here.
In Tables 8 and 9 we provide the empirical size and power results of our tests for the larger sample size . Here the size properties are similar, while power rejections rates are noticeably higher for the dynamic alternatives.
7 CONCLUSIONS
In this paper we have proposed new specification tests for the conditional distribution of discrete data with possibly infinite support. The new tests are functionals of empirical processes based on a nonrandomized transform that solves the implementation problem of the usual PIT for discrete distributions and achieves consistency against a wide class of alternatives. We show the validity of a bootstrap algorithm for approximating the null distribution of the test statistics, which are model and parameter dependent. In our simulation study, we show that our method compares favorably in many relevant situations with other methods available in the literature and have illustrated the new method in a small application.
8 APPENDIX
8.1 Properties of the nonrandomized transform
In this section we derive the basic properties of the nonrandomized transform, which are required prior to proving the weak convergence results for our empirical process. Without loss of generality and in order to make the exposition more transparent, we omit subscripts and conditioning set , and use shortcuts and .
For , and equality holds iff for some integer . For a random variable we find and . For , we have that , i.e. is not uniform and the expectation of the indicator function is never as it is for continuous .
The nonrandomized transform can be written as
[TABLE]
where
[TABLE]
Note that . We see that is a piecewise linear (continuous) function increasing in .
Let
[TABLE]
[TABLE]
In Table 10 and Lemma A we list the properties of this transform.
Lemma A**.**
For and
- (i)
, where and . When , the expectation is . 2. (ii)
3. (iii)
. 4. (iv)
Moreover, . 5. (v)
* *
Moreover, , for any interval of length . 6. (vi)
. 7. (vii)
.
8.2 Functional weak convergence of discrete martingales
In this section we present Lindeberg-Feller-type sufficient conditions for functional weak convergence of discrete martingales. In general, to establish the weak convergence one needs to check tightness and finite-dimensional convergence. In case of martingales, both parts can be verified without imposing restrictive conditions. Here we state a result of Nishiyama (2000), which extends Theorem 2.11.9 of van der Vaart and Wellner (1996) to martingales, see also Theorem A.1 in Delgado and Escanciano (2007). Further details on notation and definitions can be found in books Van der Vaart and Wellner (1996) for empirical processes and row-independent triangular arrays and in Jacod and Shiryaev (2003) for finite-dimensional semimartingales. For every , let be a discrete stochastic basis, where is a probability space equipped with a filtration . For nonempty set , let be a -valued martingale difference array with respect to filtration , i.e. for every , maps to , the space of bounded, -valued functions on with -norm and for each , is a -valued martingale difference array: is -measurable and . We are interested in studying the weak convergence of discrete martingales . Denote a decreasing series of finite partitions (DFP) of as , where such that , and monotonically in . The -entropy of the DFP is . The quadratic -modulus of is -valued process
[TABLE]
Theorem A**.**
*Let be a -valued martingale difference array and
N1) (conditional variance convergence) for every
N2) (Lindeberg condition) for every ;
N3) (partitioning entropy condition) there exist a DFP of such that and .
Then , where has normal marginals with covariance .*
8.3 Additional technical assumptions
To establish the asymptotic properties of the biparameter process we need the following assumption for uniform convergence of different empirical quantities.
Assumption A**.**
Under , the following uniform limits to continuous functions exist
, 2. 2.
, 3. 3.
, 4. 4.
, 5. 5.
.
As it is discussed in the text, these conditions restrict the dynamics of the data process such that some LLN holds, which is the case, e.g., for stationary and ergodic processes.
8.4 Proofs
Proof of Lemma A.
(i) By definition of , . Similarly, by direct calculation we obtain (ii), (iii), (vi) and (vii). We now provide a detailed proof of (iv) and (v).
(iv) We prove a stronger result that for , such that the expectation with respect to is bounded: . Then, the required bound is obtained by setting .
Since never exceeds , we have that , therefore we bound the latter expectation.
Suppose that . Then for , i.e. with probability , and is zero for other . Therefore,
[TABLE]
since and .
Suppose that . Note that for . We separately bound each term in
[TABLE]
For , . Then
[TABLE]
since and for we have that .
For , .
Then
[TABLE]
since and for we have that .
For , . Then
[TABLE]
since .
Adding everything together, get that for . This equation is symmetric with respect to and ; therefore, it holds also for .
(v) Let denote the interval of length , denote the supremum of over , where .
Note that ; moreover, if , then and if , then .
Suppose that , i.e. . Then .
Suppose that , i.e. contains at least one point or even intervals . On such intervals, goes up to , but the probability of taking all such is bounded by . More precisely,
[TABLE]
since the sum of the first and the last terms is below , the second and the fourth terms each is bounded by and the third term is .
Proof of Lemma 1.
Substitute in Lemma A(i) to demonstrate that , therefore is a martingale difference sequence for every . The conditional variance expression follows from Lemma A(iii) by taking .
However the are not independent in general. To show that, note that bivariate independence requires that
[TABLE]
for all and . Now we see that the lhs is
[TABLE]
and now, for and under
[TABLE]
which depends on and therefore with positive probability, and independence does not follow in general.
Proof of Lemma 2..
Because are continuous, is a (uniform) consistent estimate of cdf of . Then by Lemma A(vi) and A(vii) and ULLN we get the uniform consistency of and . The efficiency gain comes from Lemma A(ii).
Proof of Lemma 3..
We need to verify conditions N1-N3 of Theorem A. Fix and take with usual norm and equidistant partition , i.e. partition of in equal intervals of length (the last interval maybe even smaller), and , which is a square integrable martingale difference by Lemma 1. Then Condition N1 follows from Lemma 1 and Assumption 1. Condition N2 is satisfied because for , the indicator . Condition N3 follows from the bound in Lemma A(v). Indeed, and
[TABLE]
Proof of Lemma 4..
Apply weak convergence result from Lemma 3 under with , which is a
square integrable martingale difference because of Lemma A(i) with and . Then Condition N1 follows from Lemma A(iii) and the fact that are bounded in absolute value by a.s. Condition N2 is satisfied because for , the indicator is [math]. Condition N3 follows from the bound in Lemma A(v) and the fact that applied to a.s. bounded r.v. are bounded in absolute value by a.s. We obtain that , the same limit as in Lemma 3. Finally, use additivity of in the first argument and apply ULLN to .
Proof of Lemma 5..
Under , i.e. under , Equation (4) can be established using standard methods, applying Doob and Rosenthal inequalities for MDS (Hall and Heyde, 1980) Define . When it is necessary, we will write explicitly arguments: . We show that . Since
, it is sufficient to establish that for some
[TABLE]
Note that for , by Assumption 3C,
[TABLE]
First, we will show that . Since are bounded by 2 in absolute value and form a martingale difference sequence with respect to , by the Doob inequality and
[TABLE]
and by Rosenthal inequality,
[TABLE]
Take . The first term is small because of bounds in Lemma A(iv) and (9). Because , . Therefore we have a pointwise bound. Uniformity in can be established using monotonicity of and continuity of by employing bounds in Lemma A(iv) and (9).
Finally, use that uniformly in
[TABLE]
Proof of Theorem 1..
The joint weak convergence (6) follows from finite-dimensional convergence by CLT for MDS, while tightness was established in the proof of Lemma 4.
Proof of Theorem 2..
Note that
[TABLE]
where
[TABLE]
is a square integrable martingale difference by Lemma 1. The rest is similar to the proof of Theorem 1. To obtain under , verify conditions N1-N3 of Theorem A for as it is done in the proof of Lemma 3. The covariance function of is
[TABLE]
Under , apply the same weak convergence result under with
[TABLE]
which is a square integrable martingale difference because of Lemma A(i) with and . Then proceed as in the proof of Lemma 4.
In order to establish (7), repeat the steps of the proof of Lemma 5 for , where is with in place of .
Proof of Theorem 4..
Repeat the arguments of the proofs of Theorems 1 and 2 for sample generated by , defined in Assumption 6, to obtain conditional convergence. Then follow as in Andrews (1997) proof of Corollary 1.
8.5 Checking assumptions for the Poisson model
Here we write for . For Poisson model the probability distribution is and the cumulative distribution function is
[TABLE]
where is the regularized gamma function, and , . If covariates are iid or stationary and ergodic, and omits lags of the dependent variable then the LLN applies both under the null and local alternatives (like, e.g., the local alternative considered in Eq. (2.12) in Cameron and Trivedi, 1990) to justify Assumptions 2-6 and Assumption A, which involve functions of that are uniformly continuous in . However, it can also be interesting to allow the intensity to depend on lags of the dependent variable. For simplicity we consider dynamics. can be treated similarly but is more lengthy. The parameters enter through , , and are gathered in . We assume that are positive, and are fixed and . Under these conditions, there exist a unique stationary and ergodic solution to this model (Fokianos et al., 2009). Such data generating processes allow to use results on (generic, uniform) LLN, which facilitate the checking of assumptions in the paper. Conditions for stationarity and ergodicity for nonlinear can be found in Neumann (2011) and are directly applicable to the analysis under the null hypothesis. However, we are not aware of LLN results for these models under local alternatives despite Fokianos and Neumann (2013, Proposition 2.3(ii)) use related arguments.
Let and the null hypothesis is for some . Then and , and the nonrandomized transform for is
[TABLE]
from where one obtains the empirical processes and the test statistics defined in Sections 1-2.
Now consider Assumption 1. For Poisson model
[TABLE]
where . For the Poisson DGP described above, is stationary and ergodic, satisfies Assumption 1. By the same argument Assumptions 2, 3D, 4C, 5 are fulfilled.
Assumption 3A and 3B are trivial. For Assumption 3C note that
[TABLE]
where
[TABLE]
The last expression can be iterated from to and because the arithmetic progression sum of mean squares is bounded, as in the proof of Lemma 3.2 of Fokianos et al. (2009).
Assumption 4A, 4B and 6B are standard, see e.g. Andrews (1997) which adapts to Poisson model using Theorem 3.1 of Fokianos et al. (2009).
Assumption 6A is trivial, because there is no explanatory variables other than own past values.
9 Acknowledgements
We thank Juan Mora for useful comments. Support from the Ministerio Economia y Competitividad (Spain), grants ECO2012-31748, ECO2014-57007p, MDM 2014-0431, Comunidad de Madrid, MadEco-CM (S2015/HUM-3444), and Fundación Ramón Areces is gratefully acknowledged.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andrews, D.W.K. (1997) A conditional Kolmogorov test, Econometrica 65, 1097- 1128.
- 2[2] Bai, J. (2003) Testing Parametric Conditional Distributions of Dynamic Models, Review of Economics and Statistics 85, 531-549.
- 3[3] Basu, D. and R. de Jong (2007). Dynamic Multinomial Ordered Choice with an Application to the Estimation of Monetary Policy Rules. Studies in Nonlinear Dynamics and Econometrics , 11, 1507-1507.
- 4[4] Burke, M. D., Csorgo M., Csorgo S. and P. Revesz (1978). Approximaiton of the empirical process whe parameters are estimated. Annals of Probability , 7, 790-810.
- 5[5] Cameron A.C. and P.K. Trivedi (1990) Regression-based tests for overdispersion in the Poisson model, Journal of Econometrics 46, 347-364.
- 6[6] Corradi, V. and R. Swanson (2006) Bootstrap conditional distribution test in the presence of dynamic misspecification, Journal of Econometrics 133, 779-806.
- 7[7] Czado, C., T. Gneiting and L. Held (2009). Predictive model assessment for count data. Biometrics , 65, 1254-1261.
- 8[8] Davis R. A., W. T. M. Dunsmuir and S. B. Streett (2003) Observation-Driven Models for Poisson Counts. Biometrika 90, 777-790.
