New goodness-of-fit diagnostics for conditional discrete response models

Igor Kheifets; Carlos Velasco

arXiv:1706.00378·math.ST·February 1, 2018

New goodness-of-fit diagnostics for conditional discrete response models

Igor Kheifets, Carlos Velasco

PDF

TL;DR

This paper introduces new goodness-of-fit tests for discrete response models that improve power by avoiding randomization, applicable to static and dynamic ordered choice models, with theoretical analysis and empirical validation.

Contribution

It develops an alternative transformation for discrete data that enhances test power without randomization, extending specification testing to a broader class of models.

Findings

01

New transformation improves test power over traditional jittered methods.

02

Asymptotic properties of tests are analytically derived.

03

Bootstrap method effectively approximates critical values.

Abstract

This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an…

Tables10

Table 1. Table 1: Scenarios for Monte Carlo simulations.

Scenario	Null and Alternative
Size 1	$H_{0} :$ static probit
Size 2	$H_{0} :$ static logit
Power 1	$H_{0} :$ static probit vs $H_{1} :$ static logit
Power 2	$H_{0} :$ static probit vs $H_{1} :$ dynamic probit
Power 3	$H_{0} :$ static probit vs $H_{1} :$ dynamic logit

Table 2. Table 2: ML estimates and standard errors of Models I-IV with static and dynamic specifications and Probit link function applied to the real US data, T = 204 𝑇 204 T=204 .

	I-static	I-dynamic	II-static	II-dynamic	III-static	III-dynamic	IV-static	IV-dynamic
$τ_{1}$	$- 4.81$	$- 2.07$	$- 3.31$	$- 1.05$	$- 3.15$	$- 1.17$	$- 3.41$	$- 1.48$
	$(0.51)$	$(0.66)$	$(0.35)$	$(0.47)$	$(0.36)$	$(0.48)$	$(0.37)$	$(0.50)$
$τ_{2}$	$- 4.05$	$- 1.14$	$- 2.64$	$- 0.19$	$- 2.34$	$- 0.20$	$- 2.57$	$- 0.50$
	$(0.47)$	$(0.64)$	$(0.31)$	$(0.46)$	$(0.32)$	$(0.47)$	$(0.32)$	$(0.48)$
$τ_{3}$	$- 1.72$	$1.66$	$- 0.39$	$2.60$	$0.09$	$2.62$	$- 0.11$	$2.29$
	$(0.40)$	$(0.63)$	$(0.26)$	$(0.48)$	$(0.28)$	$(0.48)$	$(0.27)$	$(0.49)$
$i n f$	$- 1.39$	$- 1.36$	$- 1.51$	$- 1.60$	$- 1.83$	$- 1.82$	$- 1.70$	$- 1.70$
	$(0.68)$	$(0.72)$	$(0.67)$	$(0.71)$	$(0.69)$	$(0.73)$	$(0.69)$	$(0.73)$
$i n f_{- 1}$	$1.86$	$2.90$	$1.94$	$3.05$	$2.05$	$3.07$	$2.14$	$3.01$
	$(0.99)$	$(1.06)$	$(0.98)$	$(1.06)$	$(1.00)$	$(1.07)$	$(1.01)$	$(1.07)$
$i n f_{- 2}$	$- 1.30$	$- 2.81$	$- 1.27$	$- 2.80$	$- 1.60$	$- 2.92$	$- 2.12$	$- 3.11$
	$(0.98)$	$(1.07)$	$(0.97)$	$(1.06)$	$(0.99)$	$(1.07)$	$(1.02)$	$(1.09)$
$i n f_{- 3}$	$1.39$	$2.44$	$1.60$	$2.74$	$1.79$	$2.79$	$1.27$	$2.33$
	$(0.99)$	$(1.06)$	$(0.98)$	$(1.06)$	$(1.00)$	$(1.08)$	$(1.03)$	$(1.09)$
$i n f_{- 4}$	$0.43$	$- 0.53$	$- 0.23$	$- 1.05$	$- 0.00$	$- 0.85$	$0.88$	$- 0.20$
	$(0.68)$	$(0.73)$	$(0.66)$	$(0.71)$	$(0.67)$	$(0.73)$	$(0.71)$	$(0.76)$
$o u t$	$- 1.02$	$- 1.02$	$0.36$	$0.40$	$3.35$	$2.54$	$- 0.98$	$- 0.62$
	$(0.30)$	$(0.33)$	$(0.59)$	$(0.63)$	$(0.68)$	$(0.74)$	$(0.22)$	$(0.23)$
$o u t_{- 1}$	$0.81$	$0.90$	$0.84$	$0.65$	$2.48$	$0.95$	$- 1.03$	$- 0.65$
	$(0.29)$	$(0.32)$	$(0.59)$	$(0.64)$	$(0.67)$	$(0.73)$	$(0.22)$	$(0.23)$
$Y_{- 1}$	—	$- 1.08$	—	$- 1.12$	—	$- 1.03$	—	$- 0.94$
		$(0.15)$		$(0.15)$		$(0.16)$		$(0.16)$

Table 3. Table 3: ML estimates and standard errors of Models I-IV with static and dynamic specifications and Logit link function applied to the real US data, T = 204 𝑇 204 T=204 .

	I-static	I-dynamic	II-static	II-dynamic	III-static	III-dynamic	IV-static	IV-dynamic
$τ_{1}$	$- 8.46$	$- 3.77$	$- 6.01$	$- 2.12$	$- 5.61$	$- 2.15$	$- 6.15$	$- 2.82$
	$(0.98)$	$(1.20)$	$(0.68)$	$(0.83)$	$(0.69)$	$(0.85)$	$(0.72)$	$(0.89)$
$τ_{2}$	$- 7.03$	$- 1.96$	$- 4.71$	$- 0.46$	$- 4.12$	$- 0.31$	$- 4.56$	$- 0.90$
	$(0.90)$	$(1.17)$	$(0.60)$	$(0.81)$	$(0.59)$	$(0.83)$	$(0.61)$	$(0.86)$
$τ_{3}$	$- 3.00$	$3.02$	$- 0.85$	$4.52$	$0.07$	$4.60$	$- 0.24$	$4.04$
	$(0.72)$	$(1.12)$	$(0.47)$	$(0.84)$	$(0.49)$	$(0.86)$	$(0.49)$	$(0.87)$
$i n f$	$- 2.44$	$- 2.29$	$- 2.53$	$- 2.89$	$- 3.17$	$- 3.28$	$- 2.81$	$- 3.06$
	$(1.21)$	$(1.30)$	$(1.21)$	$(1.29)$	$(1.21)$	$(1.32)$	$(1.22)$	$(1.32)$
$i n f_{- 1}$	$3.28$	$4.95$	$3.22$	$5.46$	$3.59$	$5.43$	$3.41$	$5.31$
	$(1.78)$	$(1.92)$	$(1.77)$	$(1.92)$	$(1.76)$	$(1.93)$	$(1.82)$	$(1.95)$
$i n f_{- 2}$	$- 2.48$	$- 5.02$	$- 2.17$	$- 5.22$	$- 2.97$	$- 5.21$	$- 3.52$	$- 5.40$
	$(1.74)$	$(1.95)$	$(1.73)$	$(1.94)$	$(1.76)$	$(1.95)$	$(1.86)$	$(1.99)$
$i n f_{- 3}$	$2.42$	$4.36$	$2.61$	$5.20$	$2.94$	$5.11$	$1.65$	$4.02$
	$(1.75)$	$(1.92)$	$(1.75)$	$(1.93)$	$(1.77)$	$(1.95)$	$(1.86)$	$(1.99)$
$i n f_{- 4}$	$0.93$	$- 0.87$	$- 0.17$	$- 1.88$	$0.32$	$- 1.54$	$2.11$	$- 0.28$
	$(1.20)$	$(1.32)$	$(1.18)$	$(1.28)$	$(1.19)$	$(1.30)$	$(1.27)$	$(1.36)$
$o u t$	$- 1.78$	$- 1.79$	$0.43$	$0.63$	$5.87$	$4.12$	$- 1.83$	$- 1.15$
	$(0.54)$	$(0.60)$	$(1.04)$	$(1.14)$	$(1.24)$	$(1.34)$	$(0.40)$	$(0.42)$
$o u t_{- 1}$	$1.43$	$1.59$	$1.61$	$1.29$	$4.21$	$1.50$	$- 1.88$	$- 1.14$
	$(0.52)$	$(0.59)$	$(1.04)$	$(1.15)$	$(1.20)$	$(1.33)$	$(0.40)$	$(0.42)$
$Y_{- 1}$	—	$- 1.98$	—	$- 2.04$	—	$- 1.86$	—	$- 1.71$
		$(0.28)$		$(0.27)$		$(0.28)$		$(0.28)$

Table 4. Table 4: P-values of Cramer – von Misses tests for static Probit and Logit link function applied to the real US data, T = 204 𝑇 204 T=204 .

		${\hat{S}}_{2 T}$	${\hat{R}}_{2 T, 50}$	${\hat{R}}_{2 T, 25}$	${\hat{R}}_{2 T}$	${\hat{S}}_{1 T}$	${\hat{R}}_{1 T, 50}$	${\hat{R}}_{1 T, 25}$	${\hat{R}}_{1 T}$	${\hat{Z}}_{T}$
$H_{0} :$ static probit
	Model I	$0.001$	$0.001$	$0.001$	$0.237$	$0.009$	$0.026$	$0.078$	$0.516$	$0.244$
	Model II	$0.001$	$0.001$	$0.001$	$0.166$	$0.077$	$0.057$	$0.229$	$0.167$	$0.022$
	Model III	$0.001$	$0.001$	$0.001$	$0.307$	$0.492$	$0.632$	$0.616$	$0.731$	$0.109$
	Model IV	$0.001$	$0.002$	$0.002$	$0.496$	$0.721$	$0.509$	$0.582$	$0.668$	$0.268$
$H_{0} :$ static logit
	Model I	$0.001$	$0.001$	$0.001$	$0.152$	$0.021$	$0.079$	$0.221$	$0.793$	$0.199$
	Model II	$0.001$	$0.001$	$0.001$	$0.112$	$0.113$	$0.155$	$0.459$	$0.240$	$0.032$
	Model III	$0.001$	$0.001$	$0.001$	$0.360$	$0.314$	$0.493$	$0.541$	$0.745$	$0.171$
	Model IV	$0.001$	$0.001$	$0.001$	$0.448$	$0.890$	$0.804$	$0.899$	$0.634$	$0.272$

Table 5. Table 5: P-values of Kolmogorov – Smirnov tests for static Probit and Logit link function applied to the real US data, T = 204 𝑇 204 T=204 .

		${\hat{S}}_{2 T}$	${\hat{R}}_{2 T, 50}$	${\hat{R}}_{2 T, 25}$	${\hat{R}}_{2 T}$	${\hat{S}}_{1 T}$	${\hat{R}}_{1 T, 50}$	${\hat{R}}_{1 T, 25}$	${\hat{R}}_{1 T}$	${\hat{Z}}_{T}$
$H_{0} :$ static probit
	Model I	$0.003$	$0.002$	$0.002$	$0.082$	$0.047$	$0.193$	$0.372$	$0.354$	$0.392$
	Model II	$0.001$	$0.001$	$0.002$	$0.586$	$0.351$	$0.426$	$0.626$	$0.450$	$0.107$
	Model III	$0.001$	$0.001$	$0.001$	$0.155$	$0.454$	$0.435$	$0.244$	$0.742$	$0.124$
	Model IV	$0.001$	$0.002$	$0.002$	$0.799$	$0.936$	$0.913$	$0.801$	$0.355$	$0.230$
$H_{0} :$ static logit
	Model I	$0.001$	$0.001$	$0.001$	$0.133$	$0.010$	$0.050$	$0.212$	$0.684$	$0.220$
	Model II	$0.001$	$0.001$	$0.001$	$0.354$	$0.114$	$0.201$	$0.319$	$0.416$	$0.058$
	Model III	$0.001$	$0.001$	$0.001$	$0.149$	$0.511$	$0.472$	$0.350$	$0.642$	$0.173$
	Model IV	$0.002$	$0.002$	$0.001$	$0.769$	$0.975$	$0.968$	$0.867$	$0.411$	$0.207$

Table 6. Table 6: Simulated size/power rates for the nominal 5% level of Cramer – von Misses tests of Models I-IV with static and dynamic specifications applied to simulated data, T = 100 𝑇 100 T=100 .

		${\hat{S}}_{2 T}$	${\hat{R}}_{2 T, 50}$	${\hat{R}}_{2 T, 25}$	${\hat{R}}_{2 T}$	${\hat{S}}_{1 T}$	${\hat{R}}_{1 T, 50}$	${\hat{R}}_{1 T, 25}$	${\hat{R}}_{1 T}$	${\hat{Z}}_{T}$
Size 1 $H_{0} :$ static probit
	Model I	$5.5$	$6.0$	$5.5$	$4.5$	$6.6$	$6.3$	$5.7$	$5.4$	$7.8$
	Model II	$5.3$	$6.7$	$5.0$	$5.5$	$6.3$	$5.2$	$4.2$	$3.3$	$6.5$
	Model III	$7.7$	$7.0$	$6.5$	$5.4$	$6.0$	$3.7$	$3.3$	$4.5$	$6.4$
	Model IV	$5.2$	$6.7$	$5.6$	$3.9$	$5.1$	$4.6$	$4.9$	$2.8$	$6.4$
Size 2 $H_{0} :$ static logit
	Model I	$6.5$	$6.5$	$4.9$	$4.1$	$7.2$	$5.6$	$6.0$	$4.4$	$7.2$
	Model II	$5.6$	$6.7$	$7.6$	$4.0$	$4.6$	$5.3$	$4.6$	$4.8$	$5.6$
	Model III	$7.3$	$9.0$	$6.4$	$3.3$	$6.4$	$7.8$	$5.2$	$3.3$	$8.5$
	Model IV	$6.6$	$6.3$	$5.0$	$4.5$	$6.5$	$4.6$	$4.7$	$4.7$	$9.1$
Power 1 $H_{0} :$ static probit vs $H_{1} :$ static logit
	Model I	$8.5$	$7.7$	$6.6$	$4.9$	$8.4$	$6.5$	$6.0$	$3.6$	$7.1$
	Model II	$5.1$	$5.0$	$4.4$	$4.0$	$6.4$	$6.9$	$5.3$	$4.0$	$8.7$
	Model III	$9.1$	$9.4$	$7.9$	$4.7$	$9.0$	$8.3$	$7.7$	$4.6$	$8.2$
	Model IV	$6.3$	$6.2$	$5.3$	$4.5$	$10.2$	$8.6$	$7.5$	$3.8$	$8.3$
Power 2 $H_{0} :$ static probit vs $H_{1} :$ dynamic probit
	Model I	$89.2$	$85.2$	$82.7$	$25.7$	$13.2$	$12.0$	$11.7$	$4.6$	$18.4$
	Model II	$92.8$	$92.3$	$91.1$	$34.2$	$10.5$	$8.1$	$8.8$	$3.0$	$17.2$
	Model III	$90.7$	$88.4$	$86.1$	$22.5$	$9.2$	$9.8$	$8.5$	$5.0$	$9.4$
	Model IV	$88.1$	$84.1$	$83.0$	$27.7$	$10.3$	$7.8$	$7.4$	$4.4$	$12.5$
Power 3 $H_{0} :$ static probit vs $H_{1} :$ dynamic logit
	Model I	$90.1$	$89.3$	$86.0$	$22.9$	$12.1$	$10.0$	$8.5$	$5.0$	$12.6$
	Model II	$94.2$	$93.0$	$90.6$	$29.8$	$9.6$	$9.1$	$7.1$	$3.9$	$14.6$
	Model III	$93.5$	$91.9$	$90.9$	$30.3$	$10.0$	$8.0$	$7.8$	$4.4$	$10.9$
	Model IV	$91.1$	$88.4$	$85.9$	$26.0$	$11.1$	$12.3$	$11.4$	$4.7$	$14.7$

Table 7. Table 7: Simulated size/power rates for the nominal 5% level of Kolmogorov – Smirnov tests of Models I-IV with static and dynamic specifications applied to simulated data, T = 100 𝑇 100 T=100 .

		${\hat{S}}_{2 T}$	${\hat{R}}_{2 T, 50}$	${\hat{R}}_{2 T, 25}$	${\hat{R}}_{2 T}$	${\hat{S}}_{1 T}$	${\hat{R}}_{1 T, 50}$	${\hat{R}}_{1 T, 25}$	${\hat{R}}_{1 T}$	${\hat{Z}}_{T}$
Size 1 $H_{0} :$ static probit
	Model I	$5.1$	$6.4$	$5.2$	$3.9$	$7.8$	$6.3$	$6.8$	$4.9$	$7.9$
	Model II	$5.5$	$6.5$	$3.9$	$4.9$	$5.9$	$5.1$	$4.1$	$4.8$	$6.2$
	Model III	$7.7$	$7.8$	$6.8$	$5.1$	$6.1$	$7.0$	$6.0$	$4.9$	$5.6$
	Model IV	$6.5$	$5.4$	$5.3$	$3.4$	$5.3$	$5.3$	$4.8$	$3.6$	$7.2$
Size 2 $H_{0} :$ static logit
	Model I	$7.0$	$6.4$	$6.1$	$5.4$	$9.1$	$6.4$	$6.3$	$3.7$	$6.7$
	Model II	$4.7$	$4.9$	$4.6$	$3.5$	$5.6$	$3.8$	$4.0$	$4.8$	$5.8$
	Model III	$8.3$	$8.3$	$6.7$	$3.2$	$6.2$	$5.7$	$3.5$	$4.0$	$10.0$
	Model IV	$6.2$	$6.5$	$5.1$	$4.7$	$6.6$	$5.8$	$5.3$	$4.0$	$8.1$
Power 1 $H_{0} :$ static probit vs $H_{1} :$ static logit
	Model I	$7.0$	$6.2$	$5.4$	$3.7$	$5.2$	$3.3$	$3.9$	$3.2$	$7.7$
	Model II	$4.3$	$3.8$	$4.5$	$3.7$	$4.1$	$3.9$	$3.6$	$3.9$	$8.9$
	Model III	$10.2$	$7.3$	$7.1$	$3.9$	$7.1$	$5.7$	$5.7$	$4.5$	$9.2$
	Model IV	$5.6$	$6.6$	$4.3$	$3.2$	$6.4$	$5.1$	$6.2$	$3.4$	$6.8$
Power 2 $H_{0} :$ static probit vs $H_{1} :$ dynamic probit
	Model I	$82.8$	$79.0$	$74.5$	$13.6$	$10.3$	$9.1$	$7.1$	$3.5$	$16.9$
	Model II	$87.9$	$85.5$	$83.3$	$17.7$	$12.1$	$11.2$	$9.3$	$3.3$	$14.0$
	Model III	$85.7$	$83.2$	$79.4$	$13.8$	$7.1$	$6.4$	$7.2$	$3.9$	$9.2$
	Model IV	$81.7$	$78.5$	$74.6$	$13.8$	$7.7$	$7.8$	$6.7$	$4.9$	$11.3$
Power 3 $H_{0} :$ static probit vs $H_{1} :$ dynamic logit
	Model I	$86.2$	$82.7$	$79.0$	$14.2$	$7.7$	$4.9$	$3.8$	$4.2$	$11.8$
	Model II	$90.0$	$86.2$	$82.2$	$15.9$	$9.3$	$7.9$	$8.1$	$4.1$	$14.2$
	Model III	$89.0$	$86.4$	$83.7$	$15.9$	$5.6$	$5.1$	$4.4$	$4.6$	$10.5$
	Model IV	$87.5$	$83.8$	$79.3$	$16.1$	$9.4$	$7.5$	$7.7$	$5.9$	$12.9$

Table 8. Table 8: Simulated size/power rates for the nominal 5% level of Cramer – von Misses tests of Models I-IV with static and dynamic specifications applied to simulated data, T = 200 𝑇 200 T=200 .

		${\hat{S}}_{2 T}$	${\hat{R}}_{2 T, 50}$	${\hat{R}}_{2 T, 25}$	${\hat{R}}_{2 T}$	${\hat{S}}_{1 T}$	${\hat{R}}_{1 T, 50}$	${\hat{R}}_{1 T, 25}$	${\hat{R}}_{1 T}$	${\hat{Z}}_{T}$
Size 1 $H_{0} :$ static probit
	Model I	$4.0$	$5.4$	$5.7$	$6.2$	$4.2$	$4.6$	$4.9$	$5.8$	$5.2$
	Model II	$4.5$	$4.4$	$3.5$	$2.4$	$6.3$	$4.7$	$5.9$	$4.4$	$7.0$
	Model III	$4.6$	$4.4$	$3.4$	$4.2$	$5.4$	$5.5$	$5.2$	$3.3$	$5.4$
	Model IV	$5.3$	$6.1$	$6.3$	$4.4$	$4.8$	$4.6$	$7.0$	$4.9$	$6.9$
Size 2 $H_{0} :$ static logit
	Model I	$7.2$	$8.2$	$6.7$	$5.8$	$5.8$	$6.7$	$6.4$	$3.8$	$5.2$
	Model II	$5.4$	$6.4$	$6.1$	$4.7$	$4.8$	$5.6$	$5.3$	$5.2$	$6.1$
	Model III	$5.3$	$5.2$	$3.9$	$4.0$	$5.6$	$5.8$	$6.7$	$4.4$	$6.9$
	Model IV	$5.4$	$6.8$	$5.0$	$4.0$	$5.6$	$5.2$	$5.0$	$4.1$	$8.3$
Power 1 $H_{0} :$ static probit vs $H_{1} :$ static logit
	Model I	$7.2$	$8.2$	$6.6$	$6.9$	$10.9$	$10.3$	$10.9$	$6.9$	$9.2$
	Model II	$4.5$	$4.9$	$5.6$	$6.3$	$7.5$	$6.4$	$7.3$	$6.7$	$6.5$
	Model III	$6.0$	$5.2$	$6.1$	$5.9$	$6.9$	$6.8$	$7.9$	$6.6$	$7.0$
	Model IV	$6.5$	$6.6$	$6.6$	$4.3$	$7.1$	$6.1$	$7.5$	$5.8$	$5.4$
Power 2 $H_{0} :$ static probit vs $H_{1} :$ dynamic probit
	Model I	$98.5$	$97.3$	$95.2$	$33.2$	$13.6$	$11.6$	$9.8$	$7.5$	$16.2$
	Model II	$99.5$	$99.3$	$98.5$	$41.5$	$16.0$	$14.8$	$12.6$	$7.1$	$18.2$
	Model III	$98.5$	$97.0$	$95.9$	$30.7$	$13.0$	$11.8$	$9.8$	$7.9$	$13.9$
	Model IV	$95.8$	$93.6$	$91.6$	$22.9$	$10.2$	$9.5$	$7.6$	$5.3$	$13.7$
Power 3 $H_{0} :$ static probit vs $H_{1} :$ dynamic logit
	Model I	$98.6$	$97.5$	$95.6$	$34.5$	$15.2$	$14.0$	$14.0$	$5.4$	$16.7$
	Model II	$99.5$	$98.9$	$98.6$	$39.1$	$16.9$	$16.1$	$13.7$	$7.4$	$20.8$
	Model III	$98.8$	$98.1$	$96.4$	$31.2$	$14.8$	$13.7$	$11.6$	$6.6$	$17.9$
	Model IV	$95.8$	$94.2$	$91.8$	$23.9$	$11.4$	$10.1$	$8.6$	$5.3$	$11.4$

Table 9. Table 9: Simulated size/power rates for the nominal 5% level of Kolmogorov – Smirnov tests of Models I-IV with static and dynamic specifications applied to simulated data, T = 200 𝑇 200 T=200 .

		${\hat{S}}_{2 T}$	${\hat{R}}_{2 T, 50}$	${\hat{R}}_{2 T, 25}$	${\hat{R}}_{2 T}$	${\hat{S}}_{1 T}$	${\hat{R}}_{1 T, 50}$	${\hat{R}}_{1 T, 25}$	${\hat{R}}_{1 T}$	${\hat{Z}}_{T}$
Size 1 $H_{0} :$ static probit
	Model I	$5.1$	$5.0$	$6.0$	$4.8$	$4.5$	$5.9$	$3.9$	$5.1$	$5.5$
	Model II	$3.7$	$3.9$	$3.9$	$2.9$	$6.3$	$5.6$	$6.4$	$4.6$	$5.7$
	Model III	$4.5$	$5.2$	$4.3$	$3.9$	$4.2$	$4.5$	$4.9$	$4.3$	$5.3$
	Model IV	$5.0$	$6.4$	$7.0$	$4.6$	$4.6$	$4.8$	$6.4$	$6.9$	$6.8$
Size 2 $H_{0} :$ static logit
	Model I	$5.7$	$5.7$	$6.3$	$4.9$	$6.3$	$5.9$	$6.3$	$3.5$	$4.8$
	Model II	$5.5$	$5.1$	$5.9$	$3.4$	$5.4$	$4.6$	$5.9$	$4.9$	$5.3$
	Model III	$3.6$	$5.4$	$4.6$	$3.6$	$6.4$	$4.3$	$5.2$	$5.3$	$7.8$
	Model IV	$6.4$	$7.3$	$5.6$	$4.7$	$6.6$	$6.4$	$4.7$	$4.5$	$8.5$
Power 1 $H_{0} :$ static probit vs $H_{1} :$ static logit
	Model I	$6.3$	$6.5$	$4.2$	$6.6$	$7.2$	$5.9$	$5.0$	$6.7$	$8.7$
	Model II	$4.6$	$5.0$	$6.4$	$6.3$	$5.2$	$4.9$	$5.7$	$6.5$	$6.1$
	Model III	$5.0$	$6.2$	$5.7$	$5.2$	$3.7$	$4.1$	$5.0$	$6.3$	$7.1$
	Model IV	$5.7$	$7.0$	$5.6$	$4.5$	$5.4$	$3.4$	$4.6$	$6.1$	$5.1$
Power 2 $H_{0} :$ static probit vs $H_{1} :$ dynamic probit
	Model I	$94.3$	$92.0$	$86.5$	$22.8$	$11.4$	$10.6$	$9.7$	$5.9$	$14.0$
	Model II	$98.1$	$96.5$	$94.4$	$26.3$	$15.5$	$13.1$	$13.5$	$7.3$	$13.1$
	Model III	$94.3$	$91.0$	$87.9$	$21.0$	$14.7$	$13.1$	$12.7$	$7.3$	$13.8$
	Model IV	$90.5$	$85.0$	$82.0$	$17.9$	$11.0$	$9.5$	$9.4$	$6.3$	$11.4$
Power 3 $H_{0} :$ static probit vs $H_{1} :$ dynamic logit
	Model I	$97.1$	$93.8$	$91.8$	$24.7$	$12.4$	$12.8$	$11.1$	$5.5$	$13.4$
	Model II	$98.9$	$97.6$	$96.5$	$29.5$	$16.9$	$17.1$	$14.6$	$7.4$	$16.9$
	Model III	$96.1$	$93.8$	$91.8$	$26.0$	$14.6$	$14.4$	$11.9$	$8.0$	$15.4$
	Model IV	$93.0$	$89.2$	$86.9$	$14.1$	$13.0$	$12.5$	$8.5$	$5.8$	$10.4$

Table 10. Table 10: Values of functionals of the new nonrandomized transform I ( ⋅ , ⋅ ) 𝐼 ⋅ ⋅ I\left(\cdot,\cdot\right) for all possible values of Y 𝑌 Y relative to inverted cdfs at points u 𝑢 u and v 𝑣 v . For instance, I F ( Y , u ) − I F ( Y , v ) = 0 subscript 𝐼 𝐹 𝑌 𝑢 subscript 𝐼 𝐹 𝑌 𝑣 0 I_{F}\left(Y,u\right)-I_{F}\left(Y,v\right)=0 if Y < F − 1 ( u ) 𝑌 superscript 𝐹 1 𝑢 Y<F^{-1}\left(u\right) and Y < F − 1 ( v ) 𝑌 superscript 𝐹 1 𝑣 Y<F^{-1}\left(v\right) , while I F ( Y , u ) − I F ( Y , v ) = − δ F ( u ) subscript 𝐼 𝐹 𝑌 𝑢 subscript 𝐼 𝐹 𝑌 𝑣 subscript 𝛿 𝐹 𝑢 I_{F}\left(Y,u\right)-I_{F}\left(Y,v\right)=-\delta_{F}\left(u\right) if Y = F − 1 ( u ) < F − 1 ( v ) 𝑌 superscript 𝐹 1 𝑢 superscript 𝐹 1 𝑣 Y=F^{-1}\left(u\right)<F^{-1}\left(v\right) .

The value of $I_{F} (Y, u)$
	$Y < F^{- 1} (u)$	$Y = F^{- 1} (u)$	$Y > F^{- 1} (u)$
	$1$	$1 - δ_{F} (u)$	$0$
The value of $𝟙 {I_{F} (Y, u) \leq v}$
$v = 0$	$0$	$0$	$1$
$v \in (0, 1)$	$0$	$𝟙 {1 - δ_{F} (u) \leq v}$	$1$
$v = 1$	$1$	$1$	$1$
The value of $I_{F} (Y, u) - I_{F} (Y, v)$
$Y < F^{- 1} (v)$	$0$	$- δ_{F} (u)$	$- 1$
$Y = F^{- 1} (v)$	$δ_{F} (v)$	$δ_{F} (v) - δ_{F} (u)$	$- 1 + δ_{F} (v)$
$Y > F^{- 1} (v)$	$1$	$1 - δ_{F} (u)$	$0$
The value of $I_{F} (Y, u) I_{F} (Y, v)$
$Y < F^{- 1} (v)$	$1$	$1 - δ_{F} (u)$	$0$
$Y = F^{- 1} (v)$	$1 - δ_{F} (v)$	$(1 - δ_{F} (u)) (1 - δ_{F} (v))$	$0$
$Y > F^{- 1} (v)$	$0$	$0$	$0$
The value of $I_{F} (Y, u) - I_{H} (Y, u)$
$Y < H^{- 1} (u)$	$0$	$- δ_{F} (u)$	$- 1$
$Y = H^{- 1} (u)$	$δ_{H} (u)$	$δ_{H} (u) - δ_{F} (u)$	$- 1 + δ_{H} (u)$
$Y > H^{- 1} (u)$	$1$	$1 - δ_{F} (u)$	$0$
The value of $I_{F} (Y, u) I_{H} (Y, u)$
$Y < H^{- 1} (u)$	$1$	$1 - δ_{F} (u)$	$0$
$Y = H^{- 1} (u)$	$1 - δ_{H} (u)$	$(1 - δ_{F} (u)) (1 - δ_{H} (u))$	$0$
$Y > H^{- 1} (u)$	$0$	$0$	$0$

Equations158

H_{0} : Y_{t} ∣ Ω_{t} \sim F_{t, θ_{0}} (\cdot ∣ Ω_{t}) for some θ_{0} \in Θ, t = 1, 2, \dots, T,

H_{0} : Y_{t} ∣ Ω_{t} \sim F_{t, θ_{0}} (\cdot ∣ Ω_{t}) for some θ_{0} \in Θ, t = 1, 2, \dots, T,

Y_{t}=\left\{\begin{array}[]{ccc}1&\ \text{if}&V_{t}^{\ast}\leq\tau_{1}\\ 2&\ \text{if}&\tau_{1}<V_{t}^{\ast}\leq\tau_{2}\\ &\vdots&\\ K&\ \text{if}&V_{t}^{\ast}>\tau_{K-1},\end{array}\right.

Y_{t}=\left\{\begin{array}[]{ccc}1&\ \text{if}&V_{t}^{\ast}\leq\tau_{1}\\ 2&\ \text{if}&\tau_{1}<V_{t}^{\ast}\leq\tau_{2}\\ &\vdots&\\ K&\ \text{if}&V_{t}^{\ast}>\tau_{K-1},\end{array}\right.

V_{t}^{*} = X_{t}^{'} β + ρ Y_{t - 1} + ε_{t},

V_{t}^{*} = X_{t}^{'} β + ρ Y_{t - 1} + ε_{t},

Pr (Y_{t} = k ∣ Ω_{t}) = Pr (τ_{k - 1} < V_{t}^{*} \leq τ_{k} ∣ Ω_{t}) = F_{ε} (τ_{k} - X_{t}^{'} β - ρ Y_{t - 1}) - F_{ε} (τ_{k - 1} - X_{t}^{'} β - ρ Y_{t - 1}),

Pr (Y_{t} = k ∣ Ω_{t}) = Pr (τ_{k - 1} < V_{t}^{*} \leq τ_{k} ∣ Ω_{t}) = F_{ε} (τ_{k} - X_{t}^{'} β - ρ Y_{t - 1}) - F_{ε} (τ_{k - 1} - X_{t}^{'} β - ρ Y_{t - 1}),

Y_{t}^{*} ∣ Ω_{t} \sim Poisson (λ_{t}),

Y_{t}^{*} ∣ Ω_{t} \sim Poisson (λ_{t}),

U_{t} (θ_{0}) := F_{t, θ_{0}} (Y_{t} ∣ Ω_{t}), t = 1, 2, \dots, T

U_{t} (θ_{0}) := F_{t, θ_{0}} (Y_{t} ∣ Ω_{t}), t = 1, 2, \dots, T

I_{t,\theta_{0}}\left(u\right):=\left\{\begin{array}[]{rrr}0,&&u\leq U_{t}^{-}\left(\theta_{0}\right);\\ \displaystyle{\ \frac{u-U_{t}^{-}\left(\theta_{0}\right)}{U_{t}\left(\theta_{0}\right)-U_{t}^{-}\left(\theta_{0}\right)},}&&U_{t}^{-}\left(\theta_{0}\right)\leq u\leq U_{t}\left(\theta_{0}\right);\\ 1,&&U_{t}\left(\theta_{0}\right)\leq u,\end{array}\right.

I_{t,\theta_{0}}\left(u\right):=\left\{\begin{array}[]{rrr}0,&&u\leq U_{t}^{-}\left(\theta_{0}\right);\\ \displaystyle{\ \frac{u-U_{t}^{-}\left(\theta_{0}\right)}{U_{t}\left(\theta_{0}\right)-U_{t}^{-}\left(\theta_{0}\right)},}&&U_{t}^{-}\left(\theta_{0}\right)\leq u\leq U_{t}\left(\theta_{0}\right);\\ 1,&&U_{t}\left(\theta_{0}\right)\leq u,\end{array}\right.

F_{θ_{0}} (u) := \frac{1}{T} t = 1 \sum T I_{t, θ_{0}} (u), u \in [0, 1],

F_{θ_{0}} (u) := \frac{1}{T} t = 1 \sum T I_{t, θ_{0}} (u), u \in [0, 1],

S_{1 T} (u) := \frac{1}{T ^{1/2}} t = 1 \sum T {I_{t, θ_{0}} (u) - u} = T^{1/2} (F_{θ_{0}} (u) - u),

S_{1 T} (u) := \frac{1}{T ^{1/2}} t = 1 \sum T {I_{t, θ_{0}} (u) - u} = T^{1/2} (F_{θ_{0}} (u) - u),

S_{2 T} (u) := \frac{1}{( T - 1 ) ^{1/2}} t = 2 \sum T {I_{t, θ_{0}} (u_{1}) I_{t - 1, θ_{0}} (u_{2}) - u_{1} u_{2}},

S_{2 T} (u) := \frac{1}{( T - 1 ) ^{1/2}} t = 2 \sum T {I_{t, θ_{0}} (u_{1}) I_{t - 1, θ_{0}} (u_{2}) - u_{1} u_{2}},

U_{t}^{r} (θ_{0}) := U_{t}^{-} (θ_{0}) + Z_{t}^{U} (U_{t} (θ_{0}) - U_{t}^{-} (θ_{0})),

U_{t}^{r} (θ_{0}) := U_{t}^{-} (θ_{0}) + Z_{t}^{U} (U_{t} (θ_{0}) - U_{t}^{-} (θ_{0})),

F_{t, θ_{0}}^{†} (y ∣ Ω_{t}) = F_{t, θ_{0}} (⌊ y ⌋ ∣ Ω_{t}) + F_{Z} (y - ⌊ y ⌋) (F_{t, θ_{0}} (⌊ y + 1 ⌋ ∣ Ω_{t}) - F_{t, θ_{0}} (⌊ y ⌋ ∣ Ω_{t})),

F_{t, θ_{0}}^{†} (y ∣ Ω_{t}) = F_{t, θ_{0}} (⌊ y ⌋ ∣ Ω_{t}) + F_{Z} (y - ⌊ y ⌋) (F_{t, θ_{0}} (⌊ y + 1 ⌋ ∣ Ω_{t}) - F_{t, θ_{0}} (⌊ y ⌋ ∣ Ω_{t})),

U_{t}^{r} (θ_{0}) = F_{t, θ_{0}}^{†} (Y_{t}^{†} ∣ Ω_{t}),

U_{t}^{r} (θ_{0}) = F_{t, θ_{0}}^{†} (Y_{t}^{†} ∣ Ω_{t}),

F_{θ_{0}}^{r} (u) := \frac{1}{T} t = 1 \sum T \mathbbm 1 {U_{t}^{r} (θ_{0}) \leq u}, u \in [0, 1],

F_{θ_{0}}^{r} (u) := \frac{1}{T} t = 1 \sum T \mathbbm 1 {U_{t}^{r} (θ_{0}) \leq u}, u \in [0, 1],

R_{1 T} (u) := T^{1/2} {F_{θ_{0}}^{r} (u) - u} = \frac{1}{T ^{1/2}} t = 1 \sum T [\mathbbm 1 {U_{t}^{r} (θ_{0}) \leq u} - u], u \in [0, 1] .

R_{1 T} (u) := T^{1/2} {F_{θ_{0}}^{r} (u) - u} = \frac{1}{T ^{1/2}} t = 1 \sum T [\mathbbm 1 {U_{t}^{r} (θ_{0}) \leq u} - u], u \in [0, 1] .

I_{t, θ_{0}, M} (Y_{t}, u) := \frac{1}{M} m = 1 \sum M \mathbbm 1 {U_{t, m}^{r} (θ_{0}) \leq u},

I_{t, θ_{0}, M} (Y_{t}, u) := \frac{1}{M} m = 1 \sum M \mathbbm 1 {U_{t, m}^{r} (θ_{0}) \leq u},

F_{θ_{0}, M}^{r} (u) := \frac{1}{T} t = 1 \sum T I_{t, θ_{0}, M} (Y_{t}, u), u \in [0, 1] .

F_{θ_{0}, M}^{r} (u) := \frac{1}{T} t = 1 \sum T I_{t, θ_{0}, M} (Y_{t}, u), u \in [0, 1] .

R_{1 T, M} (u) := T^{1/2} {F_{θ_{0}, M}^{r} (u) - u}, u \in [0, 1] .

R_{1 T, M} (u) := T^{1/2} {F_{θ_{0}, M}^{r} (u) - u}, u \in [0, 1] .

γ_{t, θ_{0}} (u, v) := \frac{( F _{k} - u \lor v ) ( u \land v - F _{k - 1} )}{F _{k} - F _{k - 1}} \mathbbm 1 {F_{t, θ_{0}}^{- 1} (u ∣ Ω_{t}) = F_{t, θ_{0}}^{- 1} (v ∣ Ω_{t})},

γ_{t, θ_{0}} (u, v) := \frac{( F _{k} - u \lor v ) ( u \land v - F _{k - 1} )}{F _{k} - F _{k - 1}} \mathbbm 1 {F_{t, θ_{0}}^{- 1} (u ∣ Ω_{t}) = F_{t, θ_{0}}^{- 1} (v ∣ Ω_{t})},

E [I_{t, θ_{0}} (u) ∣ Ω_{t}] = u, a . s .,

E [I_{t, θ_{0}} (u) ∣ Ω_{t}] = u, a . s .,

E [I_{t, θ_{0}} (u) I_{t, θ_{0}} (v) ∣ Ω_{t}] = u \land v - uv - γ_{t, θ_{0}} (u, v), a . s .

E [I_{t, θ_{0}} (u) I_{t, θ_{0}} (v) ∣ Ω_{t}] = u \land v - uv - γ_{t, θ_{0}} (u, v), a . s .

V_{1 T} (u, v) = u \land v - uv - \frac{1}{T} t = 1 \sum T E [γ_{t, θ_{0}} (u, v)] \leq u \land v - uv,

V_{1 T} (u, v) = u \land v - uv - \frac{1}{T} t = 1 \sum T E [γ_{t, θ_{0}} (u, v)] \leq u \land v - uv,

E [R_{1 T, M} (u) R_{1 T, M} (v)] = \frac{1}{M} E [R_{1 T} (u) R_{1 T} (v)] + (1 - \frac{1}{M}) E [S_{1 T} (u) S_{1 T} (v)] .

E [R_{1 T, M} (u) R_{1 T, M} (v)] = \frac{1}{M} E [R_{1 T} (u) R_{1 T} (v)] + (1 - \frac{1}{M}) E [S_{1 T} (u) S_{1 T} (v)] .

S_{1 T} \Rightarrow S_{1\infty},

S_{1 T} \Rightarrow S_{1\infty},

H_{1 T} : Y_{t} ∣ Ω_{t} \sim G_{T, t, θ_{0}} (\cdot ∣ Ω_{t}) for some θ_{0} \in Θ,

H_{1 T} : Y_{t} ∣ Ω_{t} \sim G_{T, t, θ_{0}} (\cdot ∣ Ω_{t}) for some θ_{0} \in Θ,

G_{T, t, θ_{0}} (y ∣ Ω_{t}) = (1 - \frac{δ}{T ^{1/2}}) F_{t, θ_{0}} (y ∣ Ω_{t}) + \frac{δ}{T ^{1/2}} H_{t} (y ∣ Ω_{t}),

G_{T, t, θ_{0}} (y ∣ Ω_{t}) = (1 - \frac{δ}{T ^{1/2}}) F_{t, θ_{0}} (y ∣ Ω_{t}) + \frac{δ}{T ^{1/2}} H_{t} (y ∣ Ω_{t}),

d (G, F, u) = G (F^{- 1} (u)) - F (F^{- 1} (u)) - \frac{F ( F ^{- 1} ( u ) ) - u}{f ( F ^{- 1} ( u ) )} [g (F^{- 1} (u)) - f (F^{- 1} (u))] .

d (G, F, u) = G (F^{- 1} (u)) - F (F^{- 1} (u)) - \frac{F ( F ^{- 1} ( u ) ) - u}{f ( F ^{- 1} ( u ) )} [g (F^{- 1} (u)) - f (F^{- 1} (u))] .

\frac{1}{T ^{1/2}} E [S_{1 T} (u)] = \frac{1}{T} t = 1 \sum T E [d (G_{t} (\cdot ∣ Ω_{t}), F_{t, θ_{0}} (\cdot ∣ Ω_{t}), u)] .

\frac{1}{T ^{1/2}} E [S_{1 T} (u)] = \frac{1}{T} t = 1 \sum T E [d (G_{t} (\cdot ∣ Ω_{t}), F_{t, θ_{0}} (\cdot ∣ Ω_{t}), u)] .

S_{1 T} \Rightarrow S_{1\infty} + δ D_{1},

S_{1 T} \Rightarrow S_{1\infty} + δ D_{1},

T^{1/2} (θ_{T} - θ_{0}) = O_{p} (1),

T^{1/2} (θ_{T} - θ_{0}) = O_{p} (1),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

New goodness-of-fit diagnostics for conditional discrete response

models

Igor Kheifets and Carlos Velasco ITAM, Mexico. Email: [email protected] of Economics, Universidad Carlos III de Madrid. Email: [email protected]

Abstract

This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of-fit tests based on this new transformation and explore the properties in finite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve.

Keywords: Specification tests, count data, dynamic discrete choice models, conditional probability integral transform.

JEL classification: C12, C22, C52.

1 INTRODUCTION

Many statistical models specify the conditional distribution of a discrete response variable given some explanatory variables, including the description of binary, multinomial, ordered choice and count data. In this paper we analyze goodness-of-fit tests for both static models with covariates as well as dynamic ordered choice and count data models, where the conditioning information set may also include past information on the discrete variable and a set of (contemporaneous) explanatory variables which frequently appear in the social sciences, see Kedem and Fokianos (2002) and Greene and Hensher (2010). For example, dynamic models are popular in macroeconomic applications, see for instance Hamilton and Jordá (2002), Dolado and Maria-Dolores (2002) and Basu and de Jong (2007) for modeling central banks decisions or Kauppi and Saikkonen (2008) and Startz (2008) for predicting US recessions; in finance, see e.g. Rydberg and Shephard (2003) for modeling the size of asset price movements and Fokianos et al. (2009) for the number of transactions per minute of a particular stock.

Suppose we observe the random variables $\{Y_{t},X^{\prime}_{t}\}_{t=1}^{T}$ and consider the information sets $\Omega_{t}=\left\{X_{t},Y_{t-1},X_{t-1},Y_{t-2},X_{t-2},\ldots\right\}$ for each period $t=1,2,\ldots,T$ . We are interested in testing the null hypothesis that the distribution of $Y_{t}$ conditional on $\Omega_{t}$ is in the parametric family $F_{t,\theta}(\cdot\mid\Omega_{t})$ , i.e.

[TABLE]

where $\Theta\subset\mathop{\mathbb{R}}\nolimits^{m}$ is the parameter space, while the alternative hypothesis ( $H_{1}$ ) for the omnibus test would be the negation of $H_{0}$ .

We consider a class $\mathop{\mathcal{M}}\nolimits$ of discrete conditional distributions defined on $\mathop{\mathcal{K}}\nolimits=\{1,2,\ldots,K\}$ , for integer $K>1$ or on $\mathop{\mathcal{K}}\nolimits=\{1,2,\ldots,\infty\}$ such that for all $F\in\mathop{\mathcal{M}}\nolimits$ it holds that $F\left(0\right)=0$ , $f\left(k\right):=F\left(k\right)-F\left(k-1\right)>0$ for all $k=1,2,\ldots$ and $\sum_{k\in\mathop{\mathcal{K}}\nolimits}f(k)=1$ . This setup includes numerous models that have been used extensively in applied work both for dynamic and for iid data, here we describe briefly two of them.

Example 1 (Dynamic multinomial ordered choice model).

The discrete responses $Y_{t}$ are assumed to be generated by the rule

[TABLE]

where $V_{t}^{\ast}$ is a continuous latent variable and $\tau_{1},\ldots,\tau_{K-1}$ are threshold parameters that define $K$ intervals in $\mathbb{R}$ . In a simple model, e.g. Basu and de Jong (2007), the latent variable is determined through the linear equation

[TABLE]

where $X_{t}$ is a vector of stationary exogenous regressors, $\beta$ a vector of regression parameters, $\varepsilon_{t}$ is the shock in each period, and $Y_{t-1}$ could be replaced by any function of the past $\left\{Y_{t-1},\ldots,Y_{t-n}\right\}$ for some finite $n.$ The cdf of $\varepsilon_{t},$ $F_{\varepsilon},$ is going to determine the class of multinomial model, i.e. ordered multinomial probit (if $\varepsilon_{t}$ is standard normal) or logit (if $\varepsilon_{t}$ is logistic), since $F_{t,\theta_{0}}$ is defined at once from

[TABLE]

with $\tau_{0}=-\infty$ and $\tau_{K}=\infty\$ and $\theta_{0}=\left(\beta^{\prime},\rho,\tau_{1},\ldots,\tau_{K-1}\right)^{\prime}.$

Example 2 (Poisson Model).

The variate $Y_{t}=Y_{t}^{\ast}+1$ is defined on the counts $Y_{t}^{\ast}=0,1,2,\ldots$ which are assumed to follow a conditional Poisson distribution

[TABLE]

where the conditional mean can depend on covariates through an exponential link as $\lambda_{t}=\exp(X_{t}^{\prime}\beta)$ or on previous observations through an identity link as $\lambda_{t}=\alpha_{0}+\alpha_{1}\lambda_{t-1}+\rho Y_{t-1}^{\ast},$ e.g. Fokianos et al. (2009), or through the logarithmic canonical link as log $\left(\lambda_{t}\right)=X_{t}^{\prime}\beta+\rho e_{t-1},$ where $e_{t}=\left(Y_{t}^{\ast}-\lambda_{t}\right)/\lambda_{t}$ are scaled and centered errors, e.g. Davis et al. (2003).

Despite that a correct specification is key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events, empirical researchers typically do not perform goodness of fit testing of such models as they would do in a continuous case. In general, there are only a few specification tests available for discrete data, see Mora and Moro-Egido (2007). Two of them, the test of the Generalized Linear Model (GLM) of Stute and Zhu (2002) and the conditional Kolmogorov test of Andrews (1997), based on the specification of the conditional mean for binary data, can be adapted for this purpose and we discuss this possibility and compare it to our approach in Section 6. A related test to Andrews derived for time series by Corradi and Swanson (2006) could be adapted also for discrete data, but this is testing a different null hypothesis concerning a distribution given a finite conditioning set not characterizing the complete dynamics of the process. There are also tests designed specifically for Poisson models (see e.g. Neumann 2011; Fokianos and Neumann, 2013).

In what follows we propose conditional, dynamic discrete analogs of the Kolmogorov-Smirnov goodness of fit measure that can exploit different restrictions derived from the martingale difference property of a particular transformation of the data under the null hypothesis. This property is derived from the specification of a complete dynamic model given the information set generated by all the past observations of the discrete response and other explanatory variables and is used to build the asymptotic theory for our tests. Under i.i.d. assumptions this martingale difference property leads to an exact independence of the transformation sequence under the null and a much simpler parallel asymptotic theory.

When the fitted distribution is continuous, the relative distribution of $Y_{t}$ compared to $F_{t,\theta_{0}}$ defined as the cdf of the Rosenblatt’s (1952) transforms, also called conditional Probability Integral Transforms (PIT),

[TABLE]

is standard uniform and $U_{t}\left(\theta_{0}\right)$ are distributed as independent $[0,1]$ uniform random variables under $H_{0}$ . This serves as a basis for several specification tests of $H_{0}$ , see e.g. Bai (2003) and Kheifets (2015) for dynamic models and Delgado and Stute (2008) for independent and identical distributed (iid) data. However Rosenblatt transformation is not appropriate for discrete support random variables, producing non-iid pseudo residuals even under the null of correct specification. To solve the limitations of PIT-based testing techniques for discrete data, several alternative transforms have been proposed, see Jung, Kukuk and Liesenfeld (2006), Czado, Gneiting and Held (2009) and references therein. An easy and popular way is to randomize, i.e. to interpolate the discrete values of $Y_{t}$ with independent noise in $[0,1]$ , recent references include Kheifets and Velasco (2013) and Lee (2014). Unfortunately the additional simulated noise affects the power of the tests and may lead to different conclusions depending on the simulation outcome.

In this paper instead, we consider a nonrandomized transform $Y_{t}\mapsto I_{t,\theta_{0}}\left(u\right)$ for $u\in[0,1]$ ,

[TABLE]

where $U_{t}^{-}\left(\theta_{0}\right):=F_{t,\theta_{0}}\left(Y_{t}-1\mid\Omega_{t}\right)$ . This transform, conditional on data, is nonrandomized in the sense that it does not depend on extra sources of randomness, as opposed to interpolation transforms discussed in the next section. The unconditional version of this transform appears in Handcock and Morris (1999) and more recently in Czado, Gneiting and Held (2009) where it is used for calibration, but no formal tests are proposed there. This transformation can also be seem as a particular case of the multilinear extension as defined in Genest, Nešlehová and Rémillard (2014). As we show below, for every $u\in[0,1]$ , $I_{t,\theta_{0}}\left(u\right)-u$ constitute a martingale difference sequence (MDS) with respect to $\Omega_{t}$ under $H_{0}$ and can be used for testing $H_{0}$ as $I_{t,\theta_{0}}\left(u\right)$ loses this property when the model is misspecified. For instance, we can compute the pseudo empirical relative distribution of $Y_{t}$ compared to $F_{t,\theta_{0}}$

[TABLE]

which can be contrasted with the uniform cdf using the following empirical process

[TABLE]

which converges weakly to a Gaussian process. In addition, in order to control dynamics in $I_{t,\theta_{0}}\left(u\right)$ , we can compare the joint pseudo empirical cdf with the uniform on a square using the biparameter process

[TABLE]

where $u=\left(u_{1},u_{2}\right)$ . To obtain feasible tests we need to consider norms of $S_{jT}$ for $j=1,2$ . We use the Cramer-von Mises $\int S_{jT}\left(u\right)^{2}d\varphi\left(u\right)$ for some absolute continuous measure $\varphi$ in $\left[0,1\right]^{j}$ , or Kolmogorov-Smirnov $\sup_{u\in[0,1]^{j}}\left|S_{jT}\left(u\right)\right|$ norms.

When the parameter $\theta_{0}$ is unknown under the null, we use an estimate $\widehat{\theta}_{T}$ and account for the parameter estimation effect in the $p$ -value computation with a parametric bootstrap method. It might be possible also to derive, e.g. martingale, distribution-free transforms, but since they typically need to be programmed on a case by case basis for each model, so can be impractical, and are beyond the scope of this paper. As far as we know, our proposal is the first formal specification test of ordered discrete choice models which accounts properly for parameter uncertainty and is based on a nonrandomized transform, which makes it attractive in terms of power against a wide set of alternative hypotheses.

The rest of the paper is organized as follows. In the next section, we describe different alternatives to the PIT. In Sections 3 and 4, we provide the main asymptotic properties of the nonrandomized transforms and of the resulting univariate and bivariate empirical processes using martingale theory. In particular, we establish weak limits under fixed and local alternatives accounting for parameter estimation effect. Section 5 discusses the implementation of new tests with a simple bootstrap algorithm. Section 6 provides a small simulation exercise and an application exploring the properties of specification tests based on both randomized and non randomized transformations. Then we conclude. All proofs are contained in the Appendix.

2 ALTERNATIVES TO PIT FOR DISCRETE DATA

In order to further motivate the nonrandomized transform $I_{t,\theta_{0}}$ defined in (1), we introduce the randomized PIT,

[TABLE]

where $\{Z_{t}^{U}\}_{t=1}^{T}$ are independent standard uniform random variables, and independent of $Y_{t}$ . Alternatively, $U_{t}^{r}$ can be obtained by applying the standard continuous PIT to the continuous random variable $Y_{t}^{{\dagger}}:=Y_{t}-1+Z_{t}$ , where $\{Z_{t}\}_{t=1}^{T}$ are iid with any continuous cdf $F_{Z}$ on $[0,1]$ . Indeed, we can construct the cdf of $Y_{t}^{{\dagger}}$ ,

[TABLE]

where $\left\lfloor y\right\rfloor$ is the floor function, i.e. the maximum integer not exceeding $y$ , and find that

[TABLE]

for $Z_{t}^{U}=F_{Z}\left(Z_{t}\right)$ and any choice of $F_{Z}$ , see Kheifets and Velasco (2013). Note that the cdf of $Y_{t}^{{\dagger}}$ conditional on $\Omega_{t}$ and $\left\{\Omega_{t},Z_{t-1},Z_{t-2},\ldots,Z_{1}\right\}$ coincide. Under $H_{0}$ , $U_{t}^{r}\left(\theta_{0}\right)$ are iid $U\left[0,1\right]$ variables as under any continuous distribution specification, while $U_{t}\left(\theta_{0}\right)$ and $U_{t}^{-}\left(\theta_{0}\right)$ are not independent nor $U\left[0,1\right]$ . Using the typical discrepancy measures, the empirical cdf of $U_{t}^{r}\left({\theta}_{0}\right)$ , estimated using the randomized transform $Y_{t}\mapsto\mathbbm{1}\!\left\{U_{t}^{r}\left(\theta_{0}\right)\leq u\right\}$ ,

[TABLE]

can be compared to the uniform cdf. Kheifets and Velasco (2013) then test $H_{0}$ using empirical process based on the randomized transform

[TABLE]

We can also consider reducing the dependence on a particular outcome of the noise $Z_{t}^{U}$ in (3) and in the randomized transform by taking averages over $M$ replications of $\{Z_{t}^{U}\}_{t=1}^{T}$ , conditional on the original data, similar to “average-jittering” of Machado and Santos Silva (2005). Suppose that for each $t$ we have $M$ independent sequences of uniform $U[0,1]$ noises $Z_{t,m}^{U}$ , $m=1,2,\ldots,M$ , which generate $U_{t,m}^{r}\left(\theta_{0}\right)$ according to (3). Define the M-random transform $Y_{t}\mapsto I_{t,\theta_{0},M}\left(Y_{t},u\right)$ ,

[TABLE]

which takes values on the set $\{0,1/M,2/M,\ldots,1\}$ and has mean $u$ under $H_{0}$ . Then the cdf of $U_{t}^{r}\left({\theta}_{0}\right)$ is estimated by

[TABLE]

Note that with $M=1$ we are back to $\widehat{F}_{\theta_{0}}^{r}\left(u\right)$ , and therefore, we can generalize ${R}_{1T}$ to

[TABLE]

In order to propose specification tests, following Handcock and Morris (1999), we define the discrete relative distribution of $Y_{t}$ compared to $F_{t,\theta_{0}}$ as the cdf of $U_{t}^{r}\left({\theta}_{0}\right)$ . Under $H_{0}$ , the discrete relative distribution is the uniform $U\left[0,1\right]$ . As we show in the next section, three consistent estimators of the discrete relative distribution of $Y_{t}$ compared to $F_{t,\theta_{0}}$ can be ordered in terms of efficiency in the following way: $\widetilde{F}_{\theta_{0}}\left(u\right)$ (the most efficient), $\widehat{F}_{\theta_{0},M}^{r}\left(u\right)$ and $\widehat{F}_{\theta_{0}}^{r}\left(u\right)$ . This ordering is determined by the amount of noise introduced in the definitions of the transforms: i.e. in nonrandomized, $M$ -randomized and ( $1$ -)randomized transforms. The nonrandomized transform can be equivalently obtained by integrating out the extra noise in the randomized transform $I_{t,\theta_{0}}\left(Y_{t},u\right)=\int\mathbbm{1}\!\left\{U_{t}^{r}\left(\theta_{0}\right)\leq u\right\}dF_{Z}$ or by taking the number of replications $M$ to infinity, thus completely removing the noise from the estimate of the discrete relative distribution and other functionals of the transforms. The efficiency of the nonrandomized transform translates into the increased power of the specification tests based on this transform, whose properties we study next.

3 PROPERTIES OF EMPIRICAL PROCESSES BASED ON THE NONRANDOMIZED

TRANSFORM

As shown in the next lemma, the building blocks of $\widetilde{F}_{\theta_{0}}\left(u\right),$ $I_{t,\theta_{0}}\left(u\right)-u$ , constitute a martingale difference sequence (MDS) with respect to $\Omega_{t}$ , and therefore $\widetilde{F}_{\theta_{0}}\left(u\right)$ is an unbiased and consistent estimate of the uniform cdf under the null, a reasonable basis for developing tests of $H_{0}$ . Moreover, the MDS property will allow us to establish the asymptotic properties of our test without imposing any additional restrictions. Let for $u,v\in[0,1]$

[TABLE]

where $k=k\left(u\right)=F_{t,\theta_{0}}^{-1}\left(u\mid\Omega_{t}\right)$ , with $F_{t,\theta_{0}}^{-1}\left(u\mid\Omega_{t}\right):=\min\{y:F_{t,\theta_{0}}\left(y\mid\Omega_{t}\right)\geq u\}$ being the conditional quantile function and $F_{k}:=F_{t,\theta_{0}}\left(k\mid\Omega_{t}\right)$ .

Lemma 1.

Under $H_{0}$ , $I_{t,\theta_{0}}\left(u\right)-u$ is a martingale difference sequence with respect to $\Omega_{t}$ , i.e.

[TABLE]

with conditional covariance

[TABLE]

Note that $I_{t,\theta_{0}}\left(u\right)$ are not necessarily independent across $t$ despite the fact that by the martingale difference property, $I_{t,\theta_{0}}\left(u\right)$ and $I_{t-j,\theta_{0}}\left(v\right)$ are serially uncorrelated for all $j\neq 0$ and all $u,v\in\left[0,1\right],$ see the Appendix. On the other hand, the $I_{t,\theta_{0}}\left(u\right)$ are (conditionally) heteroskedastic, therefore the variance of $S_{1T}$ is model and parameter dependent, but its distribution can be simulated conditional on exogenous information in $\Omega_{t}.$

Let $V_{1T}\left(u,v\right):=\mathop{\mathrm{C}ov}\nolimits\left[S_{1T}\left(u\right),S_{1T}\left(v\right)\right]$ , then since $0\leq\gamma_{t,\theta_{0}}\left(u,v\right)<1\ a.s.$ ,

[TABLE]

i.e. the covariance and variance of ${S}_{1T}$ are not larger than those of the randomized transformation-based process ${R}_{1T}$ or its weak limit, the Brownian sheet, see Corollary 4 in Kheifets and Velasco (2013).

Due to Lemma 1, $\mathop{\mathrm{E}}\nolimits\left[\widetilde{F}_{\theta_{0}}\left(u\right)\right]=u$ under $H_{0}$ and the natural empirical process for performing tests on $H_{0}$ is then $S_{1T}$ . This process, being based on a nonrandomized transform, does not involve the extra noise that appears in the randomized transform based empirical process ${R}_{1T}$ for testing $U_{t}^{r}\sim U[0,1]$ , proposed by Kheifets and Velasco (2013), or in its modification ${R}_{1T,M}$ based on the $M$ -randomized transform. The next lemma is the key to understand the improvement of the $M$ -randomized over the randomized and of the nonrandomized, advocated in this paper, over the $M$ -randomized transform approaches.

Lemma 2.

Suppose that the uniform law of large numbers holds for $\widehat{F}_{\theta_{0},M}^{r}\left(u\right)$ and $\widetilde{F}_{\theta_{0}}\left(u\right)$ . Independently of whether $H_{0}$ holds or not, $\widehat{F}_{\theta_{0},M}^{r}\left(u\right)$ and $\widetilde{F}_{\theta_{0}}\left(u\right)$ consistently and uniformly in $u$ estimate the relative distribution, i.e. the cdf of $U^{r}_{t}\left(\theta_{0}\right)$ . $\widetilde{F}_{\theta_{0}}\left(u\right)$ is more efficient, but the difference in efficiency goes to [math] as $M\to\infty$ . In particular, under $H_{0}$ ,

[TABLE]

From Lemma 2, it follows that $S_{1T}$ has the smallest variance, the variance of $R_{1T,M}$ is a weighted sum of those of $S_{1T}$ and $R_{1T}$ , see also Equation (5) in Machado and Santos Silva (2005). Other advantages of $S_{1T}$ over $R_{1T,M}$ , are 1) computational, as there is no need to simulate $M$ paths of transformations and 2) theoretical, since the weak convergence is easier to prove for processes which are piece-wise linear in parameters. Therefore we concentrate on studying the properties of tests based on the nonrandomized transform, for which we introduce the following assumption.

Assumption 1.

${F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right)}\left(k\right)\in\mathop{\mathcal{M}}\nolimits$ $a.s.$ for all $t$ . Moreover, there exists a finite function $\gamma_{\infty}\left(u,v\right)$ , such that uniformly in $\left(u,v\right)\in\left[0,1\right]^{2},$ $T^{-1}\sum_{t=1}^{T}\gamma_{t,\theta_{0}}\left(u,v\right)\rightarrow_{p}\gamma_{\infty}\left(u,v\right)$ .

This assumption implicitly restrict dynamics such that a uniform law of large numbers (LLN) holds for the averaged conditional covariance function. In the case of stationary and ergodic data, $\gamma_{\infty}\left(u,v\right)=\mathop{\mathrm{E}}\nolimits\left[\gamma_{1,\theta_{0}}\left(u,v\right)\right]$ . Sufficient conditions for the stationarity and ergodicity of dynamic multinomial ordered choice models are given in Basu and de Jong (2007) and for autoregressive Poisson are given in Davis et al. (2003), Fokianos et al. (2009) and Doukhan et al. (2012). Then it is possible to show the uniformity of the convergence from a point-wise result, since the summands are continuous, piece-wise polynomials in $u$ and $v$ . As an illustration, in Section 8.5 in Appendix we discuss the assumptions for the Poisson model.

The next result describes the asymptotic distribution of $S_{1T}$ under the null hypothesis. Let $\Rightarrow$ denote weak convergence in $\ell^{\infty}\left[0,1\right]$ , see e.g. van der Vaart and Wellner (1996). In fact, our empirical processes are continuous, which simplifies tightness verification. Let $V_{1\infty}\left(u,v\right):=u\wedge v-uv-\gamma_{\infty}\left(u,v\right)$ .

Lemma 3.

Suppose Assumption 1 holds. Under $H_{0}$ ,

[TABLE]

where $S_{1\infty}$ is a Gaussian process in $\left[0,1\right]$ with zero mean and covariance function $V_{1\infty}$ .

The asymptotic distribution of $S_{1T}$ is model and parameter dependent, and the practical implementation of tests when $\theta_{0}$ is unknown is discussed in Section 3.2 after presenting a general class of local alternatives to the null of correct specification of the conditional distribution.

3.1 Local Alternatives

We next discuss the asymptotic properties of the empirical process $S_{1T}$ under a class of alternative hypothesis, that will lead to consistency of the specification tests based on $S_{1T}$ for a wide class of alternatives. We consider the following class of local alternatives to $H_{0},$

[TABLE]

where

[TABLE]

for some $0<\delta<T^{1/2}$ and for all $t$ , $H_{t}\left(\cdot\mid\Omega_{t}\right)\in\mathop{\mathcal{M}}\nolimits$ . When $\delta=0$ then $H_{1T}$ nests $H_{0}.$

Following Kheifets and Velasco (2013), for any discrete distributions $G$ and $F$ in $\mathop{\mathcal{M}}\nolimits$ , with probability functions $g$ and $f$ , define

[TABLE]

Note, that $d\left(G,F,u\right)=E_{G}[I_{F}(Y,u)]-E_{F}[I_{F}(Y,u)]=E_{G}[I_{F}(Y,u)]-u$ and $d\left(G,F,u\right)\equiv 0$ if and only if $G\equiv F$ . Under any $G_{t}\left(\cdot\mid\Omega_{t}\right)\in\mathop{\mathcal{M}}\nolimits$ ,

[TABLE]

The next assumption guarantees that a LLN can be applied to the empirical discrepancy between $H_{t}$ and $F_{t,\theta_{0}}.$

Assumption 2.

Under $H_{1T}$ , there exists a finite function $D_{1}\left(u\right)$ , such that uniformly in $u\in\left[0,1\right],\ \frac{1}{T}\sum_{t=1}^{T}d\left(H_{t}\left(\cdot\mid\Omega_{t}\right),F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u\right)\rightarrow_{p}D_{1}\left(u\right)$ .

Then the following lemma shows that the departure of $H_{0}$ in the direction of $H_{1T}$ introduces a drift in the asymptotic distribution of $S_{1T}$ that will render consistency of hypothesis tests based on functionals of $H_{1T}$ .

Lemma 4.

Suppose Assumptions 1-2 hold. Under $H_{1T}$ ,

[TABLE]

where $S_{1\infty}$ is as in Lemma 3.

3.2 Parameter Estimation Effect

In practice, tests based on $S_{1T}$ are unfeasible since $\theta_{0}$ is unknown, and has to be estimated by $\widehat{\theta}_{T},$ say. We assume that we have available an estimate $\widehat{\theta}_{T}$ so that under $H_{1T}$

[TABLE]

and define the process with estimated parameters

[TABLE]

We next analyze the consequences of replacing $\theta_{0}\$ by $\widehat{\theta}_{T}$ in $\widehat{S}_{1T}$ .

Let $\|\cdot\|$ be Euclidean norm, i.e. for matrix $A$ , $\|A\|=\sqrt{\mathop{\mathrm{t}r}\nolimits\left(AA^{\prime}\right)}$ , where $A^{\prime}$ is a transpose of $A$ . For $\varepsilon>0,$ $B(a,\varepsilon)$ is an open ball in $\mathop{\mathbb{R}}\nolimits^{m}$ with the center at point $a$ and radius $\varepsilon$ . For a cdf $F_{\theta}$ in $\mathop{\mathcal{M}}\nolimits$ define

[TABLE]

where $\dot{F}_{\theta}:=\left(\partial/\partial\theta\right)F_{\theta}$ and $\dot{f}_{\theta}:=\left(\partial/\partial\theta\right)f_{\theta}$ . We need the following assumptions to analyze the asymptotic properties of $\widehat{S}_{1T}$ .

Assumption 3 (Parametric family).

(A)

The parameter space $\Theta$ is a compact set in a finite-dimensional Euclidean space, $\theta\in\Theta\subset\mathop{\mathbb{R}}\nolimits^{m}$ . 2. (B)

There exists $\delta>0$ , such that ${F_{t,\theta}}\left(\cdot\mid\Omega_{t}\right)\in\mathop{\mathcal{M}}\nolimits$ , for all $t$ , $\Omega_{t}$ , $T$ and $\theta\in B(\theta_{0},\delta)$ . 3. (C)

$F_{t,\theta}\left(k\mid\Omega_{t}\right)$ is differentiable with respect to $\theta\in B(\theta_{0},\delta)$ and under $H_{1T}$

$\max_{t}\mathop{\mathrm{E}}\nolimits\left[\max_{k}\sup_{\theta\in B(\theta_{0},\delta)}\left\|\dot{F}_{t,\theta}\left(k\mid\Omega_{t}\right)\right\|\right]\leq M_{F}<\infty.$ 4. (D)

Under $H_{1T}$ , there exists a finite $L_{1}\left(u\right):=\mathop{\mathrm{p}lim}_{T\rightarrow\infty}T^{-1}\sum_{t=1}^{T}\nabla\left(F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u\right)$ .

Conditions (A)-(C) about the parametric family of distribution are standard, see e.g. Bai (2003, Assumptions A1-A2). For dynamic ordered choice and Poisson models the differentiability of the conditional distribution with respect to the parameter is equivalent to the differentiability of the link function. Part (D) guarantees a nice limit behaviour of the average generalized derivative of $I_{t,\theta}$ . Conditions for no effect of information truncation can be provided in a similar way to Bai (2003, Assumption A4).

The following lemma provides an expansion of the empirical process with estimated parameters as the sum of the process with known parameters and a random drift describing parameter estimation.

Lemma 5.

Suppose Assumptions 1-3 hold and $T^{1/2}\left(\widehat{\theta}_{T}-\theta_{0}\right)=O_{p}(1)$ . Under $H_{1T}$ ,

[TABLE]

uniformly in $u$ .

Then, continuous functionals of $\widehat{S}_{1T}$ no longer converge to those of $S_{1}+\delta D_{1}$ under $H_{1T}$ , but the estimation effect also has to be taken into account using the following assumption. Let $Z\left(\Psi\right)$ be a normal vector with zero mean and covariance matrix $\Psi.$

Assumption 4 (Parameter estimation).

Under $H_{1T}$ , the estimator $\widehat{\theta}_{T}$ admits the asymptotic linear expansion

[TABLE]

where $\xi_{0}$ is a $m\times 1$ vector and the summands $\ell_{t}$ constitute a martingale difference sequence with respect to $\Omega_{t}$ , such that

(A)

$\mathop{\mathrm{E}}\nolimits\left[\ell_{t}\left(Y_{t},\Omega_{t}\right)\mid\Omega_{t}\right]=0$ and $T^{-1}\sum_{t=1}^{T}\mathop{\mathrm{E}}\nolimits\left[\ell_{t}\left(Y_{t},\Omega_{t}\right)\ell_{t}\left(Y_{t},\Omega_{t}\right)^{\prime}\mid\Omega_{t}\right]\overset{p}{\rightarrow}\Psi.$

(B)

Lindeberg condition $T^{-1}\sum_{t=1}^{T}\mathop{\mathrm{E}}\nolimits\left[\left\|\ell_{t}\left(Y_{t},\Omega_{t}\right)\right\|^{2}\mathbbm{1}\!\left\{{T^{-1/2}}\left\|\ell_{t}\left(Y_{t},\Omega_{t}\right)\right\|>\varepsilon\right\}\mid\Omega_{t}\right]\overset{p}{\rightarrow}0$ holds.

(C)

There exists a finite function $W_{1}\left(u\right)$ , such that $T^{-1}\sum_{t=1}^{T}\mathop{\mathrm{E}}\nolimits\left[I_{t,\theta_{0}}\left(u\right)\ell_{t}\left(Y_{t},\Omega_{t}\right)\mid\Omega_{t}\right]\rightarrow_{p}W_{1}\left(u\right)$ uniformly in $u$ .

In particular, under $H_{0}$ , $\delta\xi_{0}=0$ , the estimate $\widehat{\theta}_{T}$ is centered and $T^{1/2}\left(\widehat{\theta}_{T}-\theta_{0}\right)$ converges in distribution to $Z\left(\Psi\right)$ .

Assumption 4(A) and 4(B) hold for the MLE of many popular discrete models, including dynamic probit and logit and general discrete choice models. As an example consider estimates $\widehat{\theta}_{T}$ , which are asymptotically equivalent to the (conditional) maximum likelihood estimates, i.e.,

[TABLE]

where $s_{t}\left(k,\Omega_{t}\right):=\dot{f}_{{t,\theta_{0}}}\left(k\mid\Omega_{t}\right)/f_{{t,\theta_{0}}}\left(k\mid\Omega_{t}\right)$ is the score function and $B_{0}$ is a symmetric $m\times m$ positive definite matrix given by the limit of the Hessian,

[TABLE]

Under $H_{1T}$ , $\mathop{\mathrm{E}}\nolimits\left[s_{t}\left(Y_{t},\Omega_{t}\right)\mid\Omega_{t}\right]=\delta T^{-1/2}\sum_{k=1}^{K}s_{t}\left(k,\Omega_{t}\right)h_{t}\left(k\mid\Omega_{t}\right)$ . Then equation (5) holds with $\xi_{0}=-\mathop{\mathrm{p}lim}_{T\rightarrow\infty}B_{0}^{-1}T^{-1}\sum_{t=1}^{T}\sum_{k=1}^{K}s_{t}\left(k,\Omega_{t}\right)h_{{t}}\left(k\mid\Omega_{t}\right)$ and

$\ell_{t}\left(Y_{t},\Omega_{t}\right)=-B_{0}^{-1}s_{t}\left(Y_{t},\Omega_{t}\right)+B_{0}^{-1}\sum_{k=1}^{K}s_{t}\left(k,\Omega_{t}\right)h_{{t}}\left(k\mid\Omega_{t}\right)$ .

We can derive the covariance matrix between the process $S_{1T}\left(u\right)$ and $T^{1/2}\left(\widehat{\theta}_{T}-\theta_{0}\right)$ and obtain joint convergence results, so under $H_{1T}$

[TABLE]

where the covariance function between $S_{1\infty}$ and $Z\left(\Psi\right)$ is $W_{1}(u)$ .

We can state now the result on the asymptotic distribution of the empirical process $\widehat{S}_{1T}$ under local alternatives, whose drift is different with respect to the case without estimated parameters.

Theorem 1.

Suppose Assumptions 1-4 hold. Under $H_{1T},$

[TABLE]

where $\widehat{S}_{1\infty}:=S_{1\infty}+Z\left(\Psi\right)^{\prime}L_{1}$ is a Gaussian process with zero mean and variance function $V_{1}\left(u,v\right)+L_{1}\left(u\right)^{\prime}\Psi L_{1}\left(v\right)+W_{1}\left(u\right)^{\prime}L_{1}\left(v\right)+W_{1}\left(v\right)^{\prime}L_{1}\left(u\right)$ .

4 EMPIRICAL PROCESSES FOR DYNAMIC SPECIFICATION

Test statistics based on ${S}_{1T}$ , ${R}_{1T}$ and ${R}_{1T,M}$ verify that the conditional distribution of $Y_{t}$ is right on average across all possible $\Omega_{t}$ , so these tests might not capture all sources of misspecification. This issue is raised in Corradi and Swanson (2006), Delgado and Stute (2008) and Kheifets (2015) in relation to testing continuous distributions. However, it is not possible to develop specification tests conditioned on infinite dimensional values of $\Omega_{t}$ . Instead of truncating $\Omega_{t}$ or restricting the class of models, we consider $S_{2T}$ , a biparameter analog of $S_{1T}$ to control the possible dynamic misspecification. From Lemma 1, since under $H_{0}$ , $I_{t,\theta_{0}}\left(u_{1}\right)-u_{1}$ is a MDS, $I_{t,\theta_{0}}\left(u_{1}\right)I_{t-1,\theta_{0}}\left(u_{2}\right)-u_{1}u_{2}$ is centered around zero, and moreover

[TABLE]

This motivates us to develop tests based on $S_{2T}$ defined in (2). This process also has zero mean under the null and identifies not only departures from the null derived from deviations of the unconditional expectation of $I_{t,\theta_{0}}\left(u\right)$ from $u,$ but also from a possible failure of the martingale property, so that $I_{t,\theta_{0}}\left(u_{1}\right)$ and $I_{t-1,\theta_{0}}\left(u_{2}\right)$ would become correlated. This idea is similar to that exploited in Kheifets’ (2015) in the context of conditional distribution testing for continuous distributions, where different methods of checking the independence property of the PIT are proposed. Alternative statistics exploiting the lack of correlations with any other lag could be proposed, but we expect that low lags are typically more useful for detecting general forms of misspecification.

One could also consider a biparameter analog of ${R}_{1T,M}$ , i.e. for some $M=1,2,\ldots,$

[TABLE]

where $u=\left(u_{1},u_{2}\right)\in[0,1]^{2}$ . In particular, a bivariate analog of ${R}_{1T}$ , ${R}_{2T}\left(u\right):={R}_{2T,1}\left(u\right)$ , is introduced in Kheifets and Velasco (2013). Tests based on ${R}_{2T}$ and ${R}_{2T,M}$ involve randomized transforms and therefore suffer from power loss compared to tests based on the nonrandomized transform.

Note, that $S_{2T}\left(u\right)-u_{1}S_{1T-1}\left(u_{2}\right)$ is a martingale. This observation will allow us to derive weak convergence of $S_{2T}$ by employing limiting theorems for MDS. Properties of $R_{2T}$ were established in Kheifets and Velasco (2013) and could be extended to $R_{2T,M}$ . Here we discuss the properties of $S_{2T}$ when we estimate $\theta_{0}.$

In practice we use the process

[TABLE]

where we can write under $H_{1T}$

[TABLE]

uniformly in $u$ , where

$\nabla_{2,t}\left(u\right):=I_{t-1,{\theta}_{0}}\left(u_{2}\right)\nabla\left(F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u_{1}\right)+u_{1}\nabla\left(F_{t-1,\theta_{0}}\left(\cdot\mid\Omega_{t-1}\right),u_{2}\right)$ and the asymptotic covariance function is $W_{2}\left(u\right):=\mathop{\mathrm{A}Cov}\nolimits\left(S_{2T}\left(u\right),T^{1/2}\left(\widehat{\theta}_{T}-\theta_{0}\right)\right)$ . To study the asymptotic properties of the biparameter process we introduce the next assumption, which extends Assumption 2.

Assumption 5.

Under $H_{1T}$ , there exist finite functions $D_{2}\left(u\right)$ and $L_{2}\left(u\right)$ , such that uniformly in $u$

(A)

$T^{-1}\sum_{t=2}^{T}\left\{I_{t-1,\theta_{0}}\left(u_{2}\right)d\left(H_{t}\left(\cdot\mid\Omega_{t}\right),F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u_{1}\right)\right.$

$+\left.u_{1}d\left(H_{t}\left(\cdot\mid\Omega_{t}\right),F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u_{2}\right)\right\}\rightarrow_{p}D_{2}\left(u\right)$ .

(B)

$T^{-1}\sum_{t=2}^{T}\nabla_{2,t}\left(u\right)\to_{p}L_{2}\left(u\right)$ .

Note that the second terms in the definitions of $D_{2}$ and $L_{2}$ correspond to $u_{1}D_{1}(u_{2})$ and $u_{1}L_{1}(u_{2})$ respectively, the equivalent for the single parameter process $S_{1T}$ , but the first ones are new. To state the next result, we need to assume existence of probabilistic limits of several random functions. For the sake of presentation, we defer precise statements to the Appendix, see Assumption A.

Theorem 2.

Suppose that in addition to the conditions of Theorem 1, Assumption 5 and Assumption A from the Appendix hold. Under $H_{1T}$ ,

[TABLE]

where $S_{2\infty}$ is a Gaussian process in $\left[0,1\right]$ with mean zero and covariance function $V_{2\infty}\left(u,v\right)$ defined in the Appendix. Under $H_{1T}$ , if parameters are estimated,

[TABLE]

where $\widehat{S}_{2\infty}:=S_{2\infty}+Z\left(\Psi\right)^{\prime}L_{2}$ is a Gaussian process with zero mean and variance function $V_{2\infty}\left(u,v\right)+L_{2}\left(u\right)^{\prime}\Psi L_{2}\left(v\right)+W_{2}\left(u\right)^{\prime}L_{2}\left(v\right)+W_{2}\left(v\right)^{\prime}L_{2}\left(u\right)$ .

When $G_{t}\left(\cdot\mid\Omega_{t}\right)$ is different from $F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ such that $D_{2}$ is non-zero, the test based on $\widehat{S}_{2T}$ has nontrivial power in the direction of $H_{1T}$ . In contrast to the univariate case with $S_{1T}$ , the first term in the definition of $D_{2}$ contains correlation with the past information and can therefore capture dynamic misspecification when this induces in such a correlation, even if the unconditional expectation of $d$ , which appears in the second term $u_{1}D_{1}(u_{2})$ , is zero. This fact is crucial if misspecification occurs in the dynamics and not only in the link function or other static aspects of the model.

5 BOOTSTRAP TESTS

To test $H_{0}$ we consider Cramer-von Mises, Kolmogorov-Smirnov or any other continuous functionals of $\widehat{S}_{jT}$ , $j=1,2$ , $\eta\left(\widehat{S}_{jT}\right)$ . Then consistency properties of specification tests based on $\widehat{S}_{jT}$ can be derived using the discussion in the previous sections by applying the continuous mapping theorem, so we omit the proof of the following result.

Theorem 3.

Suppose that conditions of Theorem 2 hold. Under $H_{1T}$ ,

[TABLE]

Since the asymptotic distributions of $S_{jT}\left(u\right)$ are model dependent, and those of $\widehat{S}_{jT}\left(u\right)$ further depend on the estimation effect, we need to resort to bootstrap methods to implement our tests in practice. In the literature, there are several resampling methods suitable for dependent data, but since under $H_{0}$ the parametric conditional distribution is fully specified, we apply a conditional parametric bootstrap algorithm that only requires to make draws from $F_{t,\widehat{\theta}}\left(\cdot\mid\Omega_{t}\right)$ to mimic the null distribution of the test statistics. For a discussion of the parametric bootstrap see Stute et al. (1993) and Andrews (1997), which can be adapted to the complications with information truncation and initialization arising in the dynamic case using the discussion in Bai (2003).

To estimate the true $1-\alpha$ quantiles $c_{j}\left(\theta_{0}\right)$ of the null asymptotic distribution of the test statistics, given by some continuous functional $\eta$ applied to $\widehat{S}_{j\infty}$ with $\delta=0$ , we implement the following steps.

Estimate the model with data $\left(Y_{t},X^{\prime}_{t}\right)$ , $t=1,2,...,T$ , get parameter estimator $\widehat{\theta}_{T}$ and compute test statistics $\eta(\widehat{S}_{jT})$ . 2. 2.

Simulate $Y_{t}^{\ast}$ with $F_{\widehat{\theta_{T}}}\left(\cdot\mid\Omega_{t}^{\ast}\right)$ recursively for $t=1,2,...,T$ , where the bootstrap information set is $\Omega_{t}^{\ast}=\left(X_{t},Y_{t-1}^{\ast},X_{t-1},Y_{t-2}^{\ast},X_{t-2},...\right)$ . 3. 3.

Estimate the model with simulated data $Y_{t}^{\ast}$ , get $\widehat{\theta}_{T}^{\ast}$ using the same method as for $\widehat{\theta}_{T},$ get bootstrapped test statistics $\eta\left(\widehat{S}_{jT}^{\ast}\right)$ . 4. 4.

Repeat 2-3 $B$ times, compute the percentiles of the empirical distribution of the $B$ bootstrapped test statistics. 5. 5.

Reject $H_{0}$ if $\eta\left(\widehat{S}_{jT}\right)$ is greater than the $(1-\alpha)$ th percentile of the empirical distribution of the $B$ bootstrapped test statistics denoted by $\widehat{c}^{\ast}_{jB}\left(\widehat{\theta}_{T}\right)$ .

To analyze the properties of our parametric bootstrap, we need to assume that the same conditions on the estimation method hold for both for original and resampled data. More formally, we have

Assumption 6.

(A)

The conditional distribution of $Y_{t}$ conditional on $\Omega_{t}$ coincides with the conditional distribution of $Y_{t}$ conditional on $\Omega_{t}\cup\{X^{\prime}_{k}\}_{k=t+1}^{T}$ .

(B)

Suppose that the sample is generated by $F_{\theta_{T}}$ , for some nonrandom sequence $\theta_{T}$ converging to $\theta_{0}$ , i.e. we have a triangular array of random variables $\{Y_{Tt}:t=1,2,\ldots,T\}$ with $(T,t)$ element generated by $F_{\theta_{T}}(\cdot\mid\Omega_{Tt})$ , where

$\Omega_{Tt}=\left\{X_{t},Y_{Tt-1},X_{t-1},Y_{Tt-2},X_{t-2},\ldots\right\}$ . Then the estimator $\widehat{\theta}_{T}$ of $\theta_{T}$ admits an asymptotic linear expansion as in Assumption 4. Moreover, assume that under the alternative $H_{1},$ there exists some $\theta_{1}\in\Theta$ so that $\theta_{1}=\mathop{\mathrm{p}lim}_{T\rightarrow\infty}\widehat{\theta}_{T}.$

This assumption insures that by simulating from the conditional distribution $F_{\theta_{T}}$ we obtain the correct joint distribution of $S_{jT}$ and $T^{1/2}\left(\widehat{\theta}_{T}-\theta_{T}\right)$ in parallel to those required in Theorems 1-2. Assumption 6 (A) says that $Y_{t}$ and future $X_{t}$ are independent conditionally on past information, i.e. that there is no direct feedback effect. For example, in a latent variable form of the ordered probit model, this assumption translates to strict exogeneity, i.e. that innovations are independent of future $X_{t}$ . Dependence between $Y_{t}$ and future $X_{t}$ is still allowed through serial dependence in $X_{t}$ and $Y_{t}$ . Assumption 6 (B) is similar to Condition (5.5) in Burke et al. (1979), Assumption (A1) in Stute et al. (1993) and Assumption E2 in Andrews (1997), and introduces a triangular array version of the expansion and central limit theorem for parameter estimates, see also the discussion in Section 4.1 in Andrews (1997).

We obtain the following result.

Theorem 4.

Suppose that in addition to conditions of Theorem 2, Assumption 6 holds. Under $H_{1T},$ as $B,T$ $\rightarrow\infty,$

[TABLE]

in probability, so $\widehat{c}_{jB}^{\ast}\left(\widehat{\theta}_{T}\right)\rightarrow_{p}c_{j}\left(\theta_{0}\right)$ , and therefore, under $H_{0},$ $\Pr\left(\eta\left(\widehat{S}_{jT}\right)>\widehat{c}_{jB}^{\ast}\left(\widehat{\theta}_{T}\right)\right)\rightarrow\alpha$ . Suppose also that the conditions of Theorem 2 hold for any $\theta_{0}\in\Theta$ . Under $H_{1},$ as $B,T$ $\rightarrow\infty,\ \widehat{c}_{jB}^{\ast}\left(\widehat{\theta}_{T}\right)=O_{p}\left(1\right)$ .

This theorem shows that the bootstrap test statistic has the same limit distribution as the original one under local alternatives, so that under the null we get the right asymptotic size using bootstrap estimated critical values and that under local alternatives we get non trivial power when the drifts of the stochastic processes $\widehat{S}_{1T}$ and $\widehat{S}_{2T}$ are non negligible. Similarly, under fixed alternatives we are able to get a bootstrap consistent test when the asymptotic test is consistent itself, i.e. $\lim_{T\rightarrow\infty}\Pr\left(\eta\left(\widehat{S}_{jT}\right)>\widehat{c}_{jB}^{\ast}\left(\widehat{\theta}_{T}\right)\right)=1$ if $\eta\left(\widehat{S}_{jT}\right)$ diverges asymptotically.

6 APPLICATION AND SIMULATIONS

In this section we use a Monte Carlo simulation exercise to investigate the finite sample properties of the tests proposed in this paper. We take as reference the dynamic ordered discrete choice models investigated in Basu and de Jong (2007) for the modeling of the monetary policy conducted by the Federal Reserve (FED). The dependent variable uses the following codification of the changes in the reference interest rate in US, the federal funds rate $i_{t}$ ,

[TABLE]

Data is monthly and spans January 1990 to December 2006, leading to $T=204$ complete observations. The explanatory variables that Basu and de Jong (2007) used to explain the decisions of the FED on $\Delta i_{t}$ are the current value and 4 lags of inflation $\left(\inf\right)$ , the current value and a lag of four different measures of output gap $\left(out\right)$ and a series of dummies that describe the decision of the FED in the previous period, $dum1_{t}=I(\Delta i_{t-1}<0),\ dum2_{t}=I(\Delta i_{t-1}>0),\ dum3_{t}=I(\Delta i_{t-1}<-0.25),\ dum4_{t}=I(\Delta i_{t-1}>0.25).$ Instead of these four dummies, we implement an AR $\left(1\right)$ , ’dynamic’ version with one lag of the discrete $Y_{t}$ as explanatory variable (and a version without lags that we refer to as ’static’ to serve as a benchmark to the inclusion of lagged endogenous variables in $\Omega_{t})$ . We consider both the Logit and Probit versions of the models. We fit four versions of the basic model based on different definitions of the output gap and conditional on the series of inflation and output gap and on the parameter estimates obtained, we simulate series $Y_{t}$ and conduct our tests on these (see Monte Carlo scenarios in Table 1).

The four choices of output gap lead to Models I-IV. The output gap is the percentage deviation of the actual from the potential output, which is interpolated to obtain a series of monthly frequency by replicating the GDP observation for any quarter to all the months in that quarter. Then two different measures of potential output are used: the potential output series provided by the Congressional Budget Office and a potential output series constructed in a real-time setting using the HP filter, leading to Models I and II. Apart from output gap, other measures of economic activity are used, such as unemployment rate and capacity utilization, leading to Models III and IV. Data sources are described in Basu and de Jong (2007).

We compare the performance of our tests with an alternative test which is also omnibus and does not require smoothing (and choice of smoothing parameters). Two general approaches can be adapted to our setup: the test of the Generalized Linear Model (GLM) of Stute and Zhu (2002) and the Conditional Kolmogorov test of Andrews (1997), as discussed in Mora and Moro-Egido (2007). The first one is a test based on a marked empirical process for testing the null $H_{0}^{\prime}:\quad\mathop{\mathrm{E}}\nolimits\left[Y\mid\widetilde{X}=x\right]=m_{\widetilde{\beta}_{01}}\left(x^{{}^{\prime}}\widetilde{\beta}_{02}\right)$ , where $m_{\widetilde{\beta}_{01}}(\cdot)$ is a parametric link function and $\widetilde{\beta}_{01},\widetilde{\beta}_{02}$ are finite dimensional parameters. In the cases where $Y$ takes only two values $\{0,1\}$ , the conditional mean coincides with the conditional probability of $Y=1$ and the null is similar to our $H_{0}$ if we were considering an i.i.d setup. To test $Y_{t}\mid\widetilde{X}_{t}\sim P_{\widetilde{\beta}_{01}}\left(\cdot\mid\widetilde{X}_{t}^{{}^{\prime}}\widetilde{\beta}_{20}\right)$ define the process

[TABLE]

The second test by Andrews is obtained by substituting $\mathbbm{1}\!\left\{\widetilde{X}_{t}^{{}^{\prime}}\widetilde{\beta}\leq y\right\}$ with $\mathbbm{1}\!\left\{\widetilde{X}_{t}\leq\widetilde{x}\right\}$ (where $\widetilde{x}$ is a real vector of dimension of $\widetilde{X}_{t}$ ) in ${Z}_{T}$ , but since it always underperforms according to simulations of Mora and Moro-Egido (2007), it is not considered here. If $Y$ takes values $\{1,\ldots,K\}$ , Mora and Moro-Egido (2007) substitute testing $H_{0}$ by $K$ tests of the hypotheses $Y_{jt}\mid\widetilde{X}_{t}\sim P_{j,\widetilde{\beta}_{01}}\left(Y_{t}\mid\widetilde{X}_{t}^{{}^{\prime}}\widetilde{\beta}_{20}\right)$ , with corresponding processes ${Z}_{j,T}$ , where $Y_{jt}=\mathbbm{1}\!\left\{Y_{t}=j\right\}$ and $j=1,2,\ldots,K$ , then the resulting pooled test statistics are

[TABLE]

and

[TABLE]

which we call the CvM and KS tests respectively. To apply these tests to our model, let $\widetilde{X}_{t}=\left(X_{t}^{{}^{\prime}},Y_{t-1},1\right)^{\prime}$ and $\widetilde{\beta}=\left(\beta^{{}^{\prime}},\rho,-\tau_{1}\right)^{\prime}$ and take the corresponding link functions.

We analyze tests based on $S_{1T}$ , $R_{1T,M}$ , $R_{1T}$ and $S_{2T}$ , $R_{2T,M}$ , $R_{2T}$ and $Z_{T}$ . In all cases we use Kolmogorov-Smirnov (KS) and Cramer-von Mises (CvM) measures. We only consider feasible bootstrap versions of tests based on $\widehat{S}_{1T}$ , $\widehat{R}_{1T,M}$ , etc, where we replace $\theta_{0}$ by root- $T\$ consistent estimates $\widehat{\theta}_{T}$ , the ML estimator in our case. We are not aware of any theoretical results for bootstrap assisted tests based on $\widehat{Z}_{T}$ in our setup, although Mora and Moro-Egido (2007) provide some simulations.

Parameter estimates for real data are reported in Tables 2 and 3. The main question is whether the static Probit or Logit models are appropriate for changes in the interest rates, and we check this with our tests. The $p$ -values in Tables 4 and 5 say that all these models are rejected even at the 1% significance level by biparameter nonrandomized transform based tests. Note that single parameter static tests (e.g. $\widehat{R}_{1T}$ , $\widehat{S}_{1T}$ ) cannot reject any proposed model with the sole exception of $\widehat{S}_{1T}$ which rejects at 5% Model II with Cramer – von Misses test statistics.

To study the reliability of these results we conduct a Monte Carlo experiment using the estimated models with the real data as data generating processes and obtain the simulations for the discrete response conditional on the covariates time series. In Tables 6 and 7 we provide the empirical size and power results of our tests across simulations for sample size $T=100$ and static Probit and Logit and output gap choices (Models I to IV). To speed up the simulation procedure, we use the warp bootstrap algorithm of Giacomini, Politis and White (2013). We see that all bootstrap tests provide reasonable size accuracy, tests based on single parameter empirical processes underrejecting slightly, while ones based on bivariate processes tend to overreject moderately. Kolmogorov-Smirnov and Cramer-von Mises tests perform similarly in all cases, and the choice of the output gap series does not make large differences either, nor does the introduction of lagged endogenous (discrete) variables in the information set.

The power of the tests for the static Probit model is analyzed against three different alternatives: static Logit, dynamic Probit and dynamic Logit. We see that the tests without randomization, $\widehat{S}_{1T}$ and $\widehat{S}_{2T}$ always perform better than random continuous processes ${\widehat{R}_{1T,M}}$ and ${\widehat{R}}_{2T,M}$ , which in turn dominate ${\widehat{R}_{1T}}$ and ${\widehat{R}}_{2T}$ , thus confirming our theoretical findings. When we compare Probit and Logit specifications while letting the dynamic aspect of the model be well specified, static in both cases, we observe that with this sample size and these specifications, it is almost impossible to distinguish Probit from Logit models. The power against a dynamic Probit and Logit alternatives is very high. Since the nature of misspecification is dynamic, once again bivariate processes should have more power compared to single parameter counterparts, as it is confirmed in our simulation results. It can also be observed that for these alternatives, the Cramer-von Mises criterium provides more power than Kolmogorov-Smirnov tests. As for alternative tests based on $\widehat{Z}_{T}$ , they have power comparable to $\widehat{S}_{1T}$ , sometimes slightly better, and are always outperformed by any bivariate test. This is not surprising, since $\widehat{Z}_{T}$ has more structure, i.e. it assumes a single-index model for covariates, but averages across points, thus suffering the same problems as other single parameter tests considered here.

In Tables 8 and 9 we provide the empirical size and power results of our tests for the larger sample size $T=200$ . Here the size properties are similar, while power rejections rates are noticeably higher for the dynamic alternatives.

7 CONCLUSIONS

In this paper we have proposed new specification tests for the conditional distribution of discrete data with possibly infinite support. The new tests are functionals of empirical processes based on a nonrandomized transform that solves the implementation problem of the usual PIT for discrete distributions and achieves consistency against a wide class of alternatives. We show the validity of a bootstrap algorithm for approximating the null distribution of the test statistics, which are model and parameter dependent. In our simulation study, we show that our method compares favorably in many relevant situations with other methods available in the literature and have illustrated the new method in a small application.

8 APPENDIX

8.1 Properties of the nonrandomized transform

In this section we derive the basic properties of the nonrandomized transform, which are required prior to proving the weak convergence results for our empirical process. Without loss of generality and in order to make the exposition more transparent, we omit subscripts $t,\theta_{0}$ and conditioning set $\Omega_{t}$ , and use shortcuts $I_{F}\left(Y,u\right)=I_{t,\theta_{0}}\left(Y_{t},u\right)$ and $I_{F,M}\left(Y,u\right)=I_{t,\theta_{0},M}\left(Y_{t},u\right)$ .

For $F\in\mathop{\mathcal{M}}\nolimits$ , $F\left(F^{-1}(u)\right)\geq u>F\left(F^{-1}(u)-1\right)$ and equality holds iff $u=F(k)$ for some integer $k$ . For a random variable $Y\sim G\in\mathop{\mathcal{M}}\nolimits$ we find $\Pr_{G}\left(F\left(Y\right)<u\right)=G\left(F^{-1}\left(u\right)-1\right)$ and $g\left(F^{-1}\left(u\right)\right):=\Pr_{G}\left(Y=F^{-1}\left(u\right)\right)=G\left(F^{-1}\left(u\right)\right)-G\left(F^{-1}\left(u\right)-1\right)$ . For $G=F$ , we have that $\Pr_{F}\left(F\left(Y\right)<u\right)=F\left(F^{-1}\left(u\right)-1\right)<u$ , i.e. $F\left(Y\right)$ is not uniform and the expectation of the indicator function $I\left(F\left(Y\right)<u\right)$ is never $u$ as it is for continuous $F$ .

The nonrandomized transform can be written as

[TABLE]

where

[TABLE]

Note that $\delta_{F}\left(u\right)\in[0,1)$ . We see that $I_{F}\left(Y,u\right)$ is a piecewise linear (continuous) function increasing in $u$ .

Let

[TABLE]

In Table 10 and Lemma A we list the properties of this transform.

Lemma A.

For $0\leq v\leq u\leq 1$ and $F,G,H\in\mathop{\mathcal{M}}\nolimits,$

(i)

$\mathop{\mathrm{E}}\nolimits_{G}\left[I_{F}\left(Y,u\right)\right]=u+d\left(G,F,u\right)$ , where $\mathop{\mathrm{E}}\nolimits_{G}\left[\cdot\right]=\int(\cdot)dG$ and $d\left(G,F,u\right)\in[-u,1-u]$ . When $G=F$ , the expectation is $u$ . 2. (ii)

$I_{F}\left(Y,u\right)I_{F}\left(Y,v\right)=I_{F}\left(Y,u\wedge v\right)-$

$\left(\delta_{F}\left(u\vee v\right)-\delta_{F}\left(u\right)\delta_{F}\left(v\right)\right)\times\mathbbm{1}\!\left\{Y=F^{-1}\left(u\right)=F^{-1}\left(v\right)\right\}.$ 3. (iii)

$\mathop{\mathrm{E}}\nolimits_{G}\left[I_{F}\left(Y,u\right)I_{F}\left(Y,v\right)\right]=u\wedge v-\delta_{F}\left(u,v\right)+d\left(G,F,u,v\right)$ . 4. (iv)

$\left|I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right|\leq 1\wedge\frac{\left|F(Y)-H(Y)\right|\vee\left|F(Y-1)-H(Y-1)\right|}{f(Y)\vee h(Y)}$

Moreover, $\mathop{\mathrm{E}}\nolimits_{F}\left[\left|I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right|^{2}\right]\leq 9\sup_{k}{\left|{F}\left(k\right)-{H}\left(k\right)\right|}$ . 5. (v)

$\left|I_{F}\left(Y,u\right)-u-I_{F}\left(Y,v\right)+v\right|\leq|u-v|\vee(1-f(Y))\text{ and }\left|I_{F}\left(Y,u\right)-u-I_{F}\left(Y,v\right)+v\right|=|u-v|\text{ if }u,v\leq F(Y-1)\text{ or }u,v\geq F(Y).$ * *

Moreover, $\mathop{\mathrm{E}}\nolimits_{F}\left[\sup_{u,v\in\Psi(\varepsilon)}\left|I_{F}\left(Y,u\right)-u-I_{F}\left(Y,v\right)+v\right|^{2}\right]\leq 4\varepsilon^{2}$ , for any interval $\Psi(\varepsilon)\subset[0,1]$ of length $\varepsilon^{2}$ . 6. (vi)

$\mathop{\mathrm{E}}\nolimits_{F_{z}}\left[\mathbbm{1}\!\left\{F^{{\dagger}}\left(Y^{{\dagger}}\right)<u\right\}\right]=I_{F}\left(Y,u\right)$ . 7. (vii)

$\mathop{\mathrm{E}}\nolimits_{F_{z}}\left[I_{F,M}\left(Y,u\right)I_{F,M}\left(Y,v\right)\right]=\frac{1}{M}I_{F}\left(Y,u\wedge v\right)+\left(1-\frac{1}{M}\right)I_{F}\left(Y,u\right)I_{F}\left(Y,v\right)$ .

8.2 Functional weak convergence of discrete martingales

In this section we present Lindeberg-Feller-type sufficient conditions for functional weak convergence of discrete martingales. In general, to establish the weak convergence one needs to check tightness and finite-dimensional convergence. In case of martingales, both parts can be verified without imposing restrictive conditions. Here we state a result of Nishiyama (2000), which extends Theorem 2.11.9 of van der Vaart and Wellner (1996) to martingales, see also Theorem A.1 in Delgado and Escanciano (2007). Further details on notation and definitions can be found in books Van der Vaart and Wellner (1996) for empirical processes and row-independent triangular arrays and in Jacod and Shiryaev (2003) for finite-dimensional semimartingales. For every $T$ , let $\left(\Omega^{T},\mathop{\mathcal{F}}\nolimits^{T},\{\mathop{\mathcal{F}}\nolimits^{T}_{t}\},P^{T}\right)$ be a discrete stochastic basis, where $\left(\Omega^{T},\mathop{\mathcal{F}}\nolimits^{T},P^{T}\right)$ is a probability space equipped with a filtration $\left\{\mathop{\mathcal{F}}\nolimits^{T}_{t}\right\}$ . For nonempty set $\Psi$ , let $\{\xi^{T}_{t}\}_{t=1,2,\ldots}$ be a $\ell^{\infty}\left(\Psi\right)$ -valued martingale difference array with respect to filtration $\mathop{\mathcal{F}}\nolimits^{T}_{t}$ , i.e. for every $t$ , $\xi^{T}_{t}$ maps $\Omega^{T}$ to $\ell^{\infty}\left(\Psi\right)$ , the space of bounded, $\mathop{\mathbb{R}}\nolimits$ -valued functions on $\Psi$ with $\sup$ -norm $\|\cdot\|=\|\cdot\|_{\infty}$ and for each $u\in\Psi$ , $\xi^{T}_{t}(u)$ is a $\mathop{\mathbb{R}}\nolimits$ -valued martingale difference array: $\xi^{T}_{t}(u)$ is $\mathop{\mathcal{F}}\nolimits^{T}_{t}$ -measurable and $\mathop{\mathrm{E}}\nolimits\left[\xi^{T}_{t}(u)\mid\mathop{\mathcal{F}}\nolimits^{T}_{t}\right]=0$ . We are interested in studying the weak convergence of discrete martingales $\sum_{t=1}^{T}\xi^{T}_{t}$ . Denote a decreasing series of finite partitions (DFP) of $\Psi$ as $\Pi=\left\{\Pi(\varepsilon)\right\}_{\varepsilon\in(0,1)\cap\mathop{\mathbb{Q}}\nolimits}$ , where $\Pi(\varepsilon)=\left\{\Psi(\varepsilon;k)\right\}_{1\leq k\leq N_{\Pi}(\varepsilon)}$ such that $\Psi=\bigcup_{k=1}^{N_{\Pi}(\varepsilon)}\Psi(\varepsilon;k)$ , $N_{\Pi}(1)=1$ and $\lim_{\varepsilon\to 0}N_{\Pi}(\varepsilon)=\infty$ monotonically in $\varepsilon$ . The $\varepsilon$ -entropy of the DFP $\Pi$ is $H_{\Pi}(\varepsilon)=\sqrt{\log N_{\Pi}(\varepsilon)}$ . The quadratic $\Pi$ -modulus of $\xi^{T}_{t}$ is $\mathop{\mathbb{R}}\nolimits_{+}\cup\{\infty\}$ -valued process

[TABLE]

Theorem A.

*Let $\{\xi^{T}_{t}\}_{t=1,2,\ldots}$ be a $\ell^{\infty}\left(\Psi\right)$ -valued martingale difference array and

N1) (conditional variance convergence) $\sum_{t=1}^{T}\mathop{\mathrm{E}}\nolimits\left[\xi^{T}_{t}(u)\xi^{T}_{t}(v)\mid\mathop{\mathcal{F}}\nolimits^{T}_{t}\right]\to_{P^{T}}V(u,v)$ for every $u,v\in\Psi;$

N2) (Lindeberg condition) $\sum_{t=1}^{T}\mathop{\mathrm{E}}\nolimits\left[\left\|\xi^{T}_{t}\right\|^{2}1\left\{\left\|\xi^{T}_{t}\right\|>{\varepsilon}\right\}\mid\mathop{\mathcal{F}}\nolimits^{T}_{t}\right]\to_{P^{T}}0$ for every ${\varepsilon}>0$ ;

N3) (partitioning entropy condition) there exist a DFP $\Pi$ of $\Psi$ such that $\left\|\xi^{T}_{t}\right\|_{\Pi,T}=O_{P^{T}}(1)$ and $\int_{0}^{1}H_{\Pi}(\varepsilon)d\varepsilon<\infty$ .

Then $\sum_{t=1}^{T}\xi^{T}_{t}\Rightarrow S$ , where $S$ has normal marginals $\left(S\left(v_{1}\right),S\left(v_{2}\right),\ldots,S\left(v_{d}\right))\right)\sim_{d}N(0,\Sigma)$ with covariance $\Sigma=\left\{V\left(v_{i},v_{j}\right)\right\}_{ij}$ .*

8.3 Additional technical assumptions

To establish the asymptotic properties of the biparameter process $S_{2T}$ we need the following assumption for uniform convergence of different empirical quantities.

Assumption A.

Under $H_{1T}$ , the following uniform limits to continuous functions exist

$\mathop{\mathrm{p}lim}_{T\rightarrow\infty}\frac{1}{T}\sum_{t=2}^{T}\gamma_{t-1,\theta_{0}}\left(u_{2},v_{2}\right)\gamma_{t,\theta_{0}}\left(u_{1},v_{1}\right)$ , 2. 2.

$\mathop{\mathrm{p}lim}_{T\rightarrow\infty}\frac{1}{T}\sum_{t=2}^{T}I_{t-1,\theta_{0}}\left(v_{2}\right)\gamma_{t,\theta_{0}}\left(u_{1},v_{1}\right)$ , 3. 3.

$\mathop{\mathrm{p}lim}_{T\rightarrow\infty}\frac{1}{T}\sum_{t=2}^{T}I_{t-1,\theta_{0}}\left(u_{2}\right)d\left(H_{t}\left(\cdot\mid\Omega_{t}\right),F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u_{1}\right)$ , 4. 4.

$\mathop{\mathrm{p}lim}_{T\rightarrow\infty}\frac{1}{T}\sum_{t=2}^{T}I_{t-1,\theta_{0}}\left(u_{2}\right)\mathop{\mathrm{E}}\nolimits\left[I_{t,\theta_{0}}\left(u_{1}\right)\ell_{t}\left(Y_{t},\Omega_{t}\right)\mid\Omega_{t}\right]$ , 5. 5.

$\mathop{\mathrm{p}lim}_{T\rightarrow\infty}\frac{1}{T}\sum_{t=2}^{T}I_{t-1,\theta_{0}}\left(u_{2}\right)\nabla\left(F_{t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u_{1}\right)$ .

As it is discussed in the text, these conditions restrict the dynamics of the data process such that some LLN holds, which is the case, e.g., for stationary and ergodic processes.

8.4 Proofs

Proof of Lemma A.

(i) By definition of $I_{F}\left(Y,u\right)$ , $\mathop{\mathrm{E}}\nolimits_{G}\left[I_{F}\left(Y,u\right)\right]=(1-\delta_{F}(u))g(F^{-1}(u))+G(F^{-1}(u))-g(F^{-1}(u))=d\left(G,F,u\right)-\delta_{F}(u)f(F^{-1}(u))+F(F^{-1}(u))=d\left(G,F,u\right)+u$ . Similarly, by direct calculation we obtain (ii), (iii), (vi) and (vii). We now provide a detailed proof of (iv) and (v).

(iv) We prove a stronger result that for $G\in\mathop{\mathcal{M}}\nolimits$ , such that $\sup_{k}\left|F\left(k\right)-G\left(k\right)\right|\lor\left|H\left(k\right)-G\left(k\right)\right|\leq\sup_{k}\left|F\left(k\right)-H\left(k\right)\right|$ the expectation with respect to $G$ is bounded: $\mathop{\mathrm{E}}\nolimits_{G}\left[\left(I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right)^{2}\right]\leq 9\sup_{k}\left|F\left(k\right)-H\left(k\right)\right|$ . Then, the required bound is obtained by setting $G\equiv F$ .

Since $\left|I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right|$ never exceeds $1$ , we have that $\mathop{\mathrm{E}}\nolimits_{G}\left[\left(I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right)^{2}\right]\leq\mathop{\mathrm{E}}\nolimits_{G}\left[\left|I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right|\right]$ , therefore we bound the latter expectation.

Suppose that $F^{-1}(u)=H^{-1}(u)$ . Then $I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)=\delta_{H}(u)-\delta_{F}(u)$ for $Y=F^{-1}(u)$ , i.e. with probability $g\left(F^{-1}(u)\right)$ , and is zero for other $Y$ . Therefore,

[TABLE]

since $\delta_{F}(u),\delta_{H}(u)\in[0,1)$ and $\sup_{k}\left|h\left(k\right)-g\left(k\right)\right|\leq 2\sup_{k}\left|F\left(k\right)-G\left(k\right)\right|$ .

Suppose that $F^{-1}(u)<H^{-1}(u)$ . Note that $I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)=0$ for $Y\not\in[F^{-1}(u),H^{-1}(u)]$ . We separately bound each term in

[TABLE]

For $Y=F^{-1}(u)$ , $I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)=-\delta_{F}(u)$ . Then

[TABLE]

since $\delta_{F}(u)\in[0,1)$ and for $u\in[H\left(F^{-1}(u)\right),F\left(F^{-1}(u)\right)]$ we have that $F\left(F^{-1}(u)\right)-u\leq F\left(F^{-1}(u)\right)-H\left(F^{-1}(u)\right)$ .

For $Y=H^{-1}(u)$ , $I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)=-1+\delta_{H}(u)$ .

Then

[TABLE]

since $\delta_{H}(u)\in[0,1)$ and for $u\in[H\left(H^{-1}(u)-1\right),F\left(H^{-1}(u)-1\right)]$ we have that $u-H\left(H^{-1}(u)-1\right)\leq F\left(H^{-1}(u)-1\right)-H\left(H^{-1}(u)-1\right)$ .

For $F^{-1}(u)<Y<H^{-1}(u)$ , $I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)=-1$ . Then

[TABLE]

since $H(H^{-1}(u)-1)<u<F(F^{-1}(u))<F(H^{-1}(u)-1)$ .

Adding everything together, get that $\mathop{\mathrm{E}}\nolimits_{G}\left[\left|I_{F}\left(Y,u\right)-I_{H}\left(Y,u\right)\right|^{2}\right]\leq 9\sup_{k}{\left|{F}\left(k\right)-{H}\left(k\right)\right|}$ for $F^{-1}(u)<H^{-1}(u)$ . This equation is symmetric with respect to $F$ and $H$ ; therefore, it holds also for $F^{-1}(u)>H^{-1}(u)$ .

(v) Let $[a,b]$ denote the interval $\Psi(\varepsilon)$ of length $\varepsilon^{2}$ , $\sup\xi^{2}$ denote the supremum of $\xi^{2}$ over $u,v\in[a,b]$ , where $\xi:=I_{F}\left(Y,u\right)-u-I_{F}\left(Y,v\right)+v$ .

Note that $\left|\xi\right|\leq 1$ ; moreover, if $[F(Y-1),F(Y)]\cap[a,b]=\emptyset$ , then $\sup\left|\xi\right|=\varepsilon^{2}$ and if $[a,b]\subset[F(Y-1),F(Y)]$ , then $\sup\left|\xi\right|=\frac{1-f(Y)}{f(Y)}\varepsilon^{2}$ .

Suppose that $F^{-1}(a)=F^{-1}(b)$ , i.e. $[a,b]\subset[F(F^{-1}(a)-1),F(F^{-1}(a))]$ . Then $\mathop{\mathrm{E}}\nolimits_{F}\left[\sup\xi^{2}\right]\leq\mathop{\mathrm{E}}\nolimits_{F}\left[\sup\left|\xi\right|\right]=\varepsilon^{2}\sum_{k\neq F^{-1}(a)}f(k)+\left(\frac{1-f(F^{-1}(a))}{f(F^{-1}(a))}\right)\varepsilon^{2}f(F^{-1}(a))=2(1-f(F^{-1}(a)))\varepsilon^{2}\leq 2\varepsilon^{2}$ .

Suppose that $F^{-1}(a)<F^{-1}(b)$ , i.e. $[a,b]$ contains at least one point $F(k)$ or even intervals $[F(k-1),F(k)]\subset[a,b]$ . On such intervals, $|\xi|$ goes up to $1-f(k)$ , but the probability of $Y$ taking all such $k$ is bounded by $b-a$ . More precisely,

[TABLE]

since the sum of the first and the last terms is below $\varepsilon^{2}$ , the second and the fourth terms each is bounded by $\varepsilon^{2}$ and the third term is $\sum_{k\in[F^{-1}(a)+1,F^{-1}(b)-1]}f(k)=F\left(F^{-1}(b)-1\right)-F\left(F^{-1}(a)+1\right)\leq b-a=\varepsilon^{2}$ .

Proof of Lemma 1.

Substitute $G=F=F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ in Lemma A(i) to demonstrate that $E\left[I_{t,\theta_{0}}\left(u\right)\mid\Omega_{t}\right]=E\left[I_{t,\theta_{0}}\left(u\right)\right]=u$ , therefore $I_{t,\theta_{0}}\left(u\right)-u$ is a martingale difference sequence for every $u\in\left[0,1\right]$ . The conditional variance expression follows from Lemma A(iii) by taking $G=F=F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ .

However the $I_{t,\theta_{0}}\left(u\right)$ are not independent in general. To show that, note that bivariate independence requires that

[TABLE]

for all $u,$ $u_{1}$ and $u_{2}\in\left[0,1\right]$ . Now we see that the lhs is

[TABLE]

and now, for $u_{1},u\in\left(0,1\right)$ and under $H_{0},$

[TABLE]

which depends on $\Omega_{t},$ and therefore $E\left(\mathbbm{1}\!\left\{I_{t,\theta_{0}}\left(u\right)\leq u_{1}\right\}\mid\Omega_{t}\right)\neq E\left(\mathbbm{1}\!\left\{I_{t,\theta_{0}}\left(u\right)\leq u_{1}\right\}\right)$ with positive probability, and independence does not follow in general.

Proof of Lemma 2..

Because $U_{t}^{r}\left({\theta}_{0}\right)$ are continuous, $\widehat{F}_{{\theta}_{0}}^{r}\left(u\right)$ is a (uniform) consistent estimate of cdf of $U_{t}^{r}\left({\theta}_{0}\right)$ . Then by Lemma A(vi) and A(vii) and ULLN we get the uniform consistency of $\widehat{F}_{{\theta}_{0},M}^{r}\left(u\right)$ and $\widetilde{F}_{{\theta}_{0}}^{r}\left(u\right)$ . The efficiency gain comes from Lemma A(ii).

Proof of Lemma 3..

We need to verify conditions N1-N3 of Theorem A. Fix $\varepsilon>0$ and take $\Psi=[0,1]$ with usual norm and equidistant partition $0=u_{0}<u_{1}<\ldots<u_{N_{\Pi}\left(\varepsilon\right)}=1$ , i.e. partition of $[0,1]$ in $N_{\Pi}\left(\varepsilon\right)=[\varepsilon^{-2}]+1$ equal intervals of length $\varepsilon^{2}$ (the last interval maybe even smaller), $\Psi(\varepsilon;k)=[u_{k-1},u_{k}]$ and $\xi_{t}^{T}=\left(I_{F}\left(Y_{t},u\right)-u\right)/\sqrt{T}$ , which is a square integrable martingale difference by Lemma 1. Then Condition N1 follows from Lemma 1 and Assumption 1. Condition N2 is satisfied because for $T>1+\left[{\varepsilon}^{-2}\right]$ , the indicator $1\left\{\sup_{u\in[0,1]}\left|I_{F}\left(Y_{t},u\right)-u\right|/\sqrt{T}>{\varepsilon}\right\}=0$ . Condition N3 follows from the bound in Lemma A(v). Indeed, $\int_{0}^{1}H_{\Pi}(\varepsilon)d\varepsilon<\infty$ and

[TABLE]

Proof of Lemma 4..

Apply weak convergence result from Lemma 3 under $G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ with $\xi^{T}_{t}:=\left(I_{F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right)}\left(Y_{t},u\right)-u-d\left(G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u\right)\right)/\sqrt{T}$ , which is a

square integrable martingale difference because of Lemma A(i) with $G=G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ and $F=F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ . Then Condition N1 follows from Lemma A(iii) and the fact that $d\left(G,F,u,v\right)$ are bounded in absolute value by $T^{-1/2}$ a.s. Condition N2 is satisfied because for $T>1+\left[{\varepsilon}^{-2}\right]$ , the indicator is [math]. Condition N3 follows from the bound in Lemma A(v) and the fact that $\left(\mathop{\mathrm{E}}\nolimits_{G}\left[\cdot\right]-\mathop{\mathrm{E}}\nolimits_{F}\left[\cdot\right]\right)$ applied to a.s. bounded r.v. are bounded in absolute value by $T^{-1/2}$ a.s. We obtain that $\sum_{t=1}^{T}\xi^{T}_{t}\Rightarrow S$ , the same limit as in Lemma 3. Finally, use additivity of $d\left(\cdot,\cdot,\cdot\right)$ in the first argument and apply ULLN to $S_{T}-\sum_{t=1}^{T}\xi^{T}_{t}=\sum_{t=1}^{T}d\left(G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u\right)/\sqrt{T}=\delta\sum_{t=1}^{T}d\left(H\left(\cdot\mid\Omega_{t}\right),F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u\right)/{T}$ .

Proof of Lemma 5..

Under $H_{1T}$ , i.e. under $G_{T,\theta_{0}}$ , Equation (4) can be established using standard methods, applying Doob and Rosenthal inequalities for MDS (Hall and Heyde, 1980) $\sqrt{T}\xi^{T}_{t}:=I_{F_{\widehat{\theta}_{T}}\left(\cdot\mid\Omega_{t}\right)}\left(Y_{t},u\right)-I_{F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right)}\left(Y_{t},u\right)-d\left(G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),F_{\widehat{\theta}_{T}}\left(\cdot\mid\Omega_{t}\right),u\right)$ $+d\left(G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right),u\right).$ Define $z_{T}:=\sum_{t=1}^{T}\xi^{T}_{t}$ . When it is necessary, we will write explicitly arguments: $z_{T}(u,\widehat{\theta}_{T})$ . We show that $\sup_{u}\left|z_{T}\right|=o_{p}(1)$ . Since

$\sqrt{T}\left(\widehat{\theta}_{T}-\theta_{0}\right)=O_{P}(1)$ , it is sufficient to establish that for some $\gamma<1/2$

[TABLE]

Note that for $T>\delta^{2}/\nu_{1}^{2}$ , by Assumption 3C,

[TABLE]

First, we will show that $\forall\ \eta,u\ \left|z_{T}\right|=o_{p}\left(1\right)$ . Since $\xi^{T}_{t}$ are bounded by 2 in absolute value and form a martingale difference sequence with respect to $\Omega_{t}$ , by the Doob inequality $\forall p\geq 1$ and $\forall\varepsilon>0$

[TABLE]

and by Rosenthal inequality, $\forall p\geq 2\ \exists C_{1}$

[TABLE]

Take $p=4$ . The first term is small because of bounds in Lemma A(iv) and (9). Because $\left|\xi^{T}_{t}\right|\leq 2/\sqrt{T}$ , $\sum E\left|\xi^{T}_{t}\right|^{p}\leq 2T^{1-p/2}$ . Therefore we have a pointwise bound. Uniformity in $u,\eta$ can be established using monotonicity of $I_{F_{\theta}\left(\cdot\mid\Omega_{t}\right)}\left(Y_{t},u\right)$ and continuity of $d\left(G_{T,\theta_{0}}\left(\cdot\mid\Omega_{t}\right),F_{\widehat{\theta}_{T}}\left(\cdot\mid\Omega_{t}\right),u\right)$ by employing bounds in Lemma A(iv) and (9).

Finally, use that uniformly in $u$

[TABLE]

Proof of Theorem 1..

The joint weak convergence (6) follows from finite-dimensional convergence by CLT for MDS, while tightness was established in the proof of Lemma 4.

Proof of Theorem 2..

Note that

[TABLE]

where

[TABLE]

is a square integrable martingale difference by Lemma 1. The rest is similar to the proof of Theorem 1. To obtain $S_{2T}\left(u\right)\Rightarrow S_{2\infty}\left(u\right)$ under $H_{0}$ , verify conditions N1-N3 of Theorem A for $\xi_{t}^{T}$ as it is done in the proof of Lemma 3. The covariance function of $S_{2\infty}\left(u\right)$ is

[TABLE]

Under $H_{1T}$ , apply the same weak convergence result under $G_{T,t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ with

[TABLE]

which is a square integrable martingale difference because of Lemma A(i) with $G=G_{T,t,\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ and $F=F_{\theta_{0}}\left(\cdot\mid\Omega_{t}\right)$ . Then proceed as in the proof of Lemma 4.

In order to establish (7), repeat the steps of the proof of Lemma 5 for $\widetilde{\zeta}^{T}_{t}:=\zeta^{T}_{t}-\widehat{\zeta}^{T}_{t}$ , where $\widehat{\zeta}^{T}_{t}$ is $\zeta^{T}_{t}$ with $F_{t,\widehat{\theta}_{T}}$ in place of $F_{t,\theta_{0}}$ .

Proof of Theorem 4..

Repeat the arguments of the proofs of Theorems 1 and 2 for sample generated by $F_{\theta_{T}}$ , defined in Assumption 6, to obtain conditional convergence. Then follow as in Andrews (1997) proof of Corollary 1.

8.5 Checking assumptions for the Poisson model

Here we write $Y_{t}$ for $Y_{t}^{\star}$ . For Poisson model $Y_{t}\mid\Omega_{t}\sim\text{Poisson}(\lambda_{t})$ the probability distribution is $\Pr(Y_{t}=k\mid\Omega_{t})=P_{\lambda_{t}}(k)=\frac{\lambda_{t}^{k}\exp(-\lambda_{t})}{k!}$ and the cumulative distribution function is

[TABLE]

where $Q(\cdot,\cdot)$ is the regularized gamma function, and $\lambda_{t}=\lambda_{t}(\beta)=\exp(X_{t}^{\prime}\beta)$ , $t=1,2,\ldots$ . If covariates $X_{t}$ are iid or stationary and ergodic, and $\Omega_{t}$ omits lags of the dependent variable $Y_{t},$ then the LLN applies both under the null and local alternatives (like, e.g., the local alternative considered in Eq. (2.12) in Cameron and Trivedi, 1990) to justify Assumptions 2-6 and Assumption A, which involve functions of $\Omega_{t}$ that are uniformly continuous in $u$ . However, it can also be interesting to allow the intensity to depend on lags of the dependent variable. For simplicity we consider $AR(1)$ dynamics. $AR(p)$ can be treated similarly but is more lengthy. The parameters enter through $\lambda_{t}=\lambda_{t}(\theta)=\alpha_{0}+\alpha_{1}\lambda_{t-1}+\rho Y_{t-1}$ , $t=1,2,\ldots$ , and are gathered in $\theta=(\alpha_{0},\alpha_{1},\rho)^{\prime}$ . We assume that $\alpha_{0},\alpha_{1},\rho$ are positive, $\lambda_{0}$ and $Y_{0}$ are fixed and $\alpha_{1}+\rho<1$ . Under these conditions, there exist a unique stationary and ergodic solution to this model (Fokianos et al., 2009). Such data generating processes allow to use results on (generic, uniform) LLN, which facilitate the checking of assumptions in the paper. Conditions for stationarity and ergodicity for nonlinear $\lambda_{t}(\theta)$ can be found in Neumann (2011) and are directly applicable to the analysis under the null hypothesis. However, we are not aware of LLN results for these models under local alternatives despite Fokianos and Neumann (2013, Proposition 2.3(ii)) use related arguments.

Let $\lambda_{t,0}=\lambda_{t}(\theta_{0})$ and the null hypothesis is $Y_{t}\mid\Omega_{t}\sim\text{Poisson}(\lambda_{t,0})$ for some $\theta_{0}\in\Theta$ . Then $U_{t}=Q(Y_{t}+1,\lambda_{t,0})$ and $U^{-}_{t}=Q(Y_{t},\lambda_{t,0})$ , and the nonrandomized transform $Y_{t}\mapsto I_{t,\theta_{0}}\left(u\right)$ for $u\in[0,1]$ is

[TABLE]

from where one obtains the empirical processes and the test statistics defined in Sections 1-2.

Now consider Assumption 1. For Poisson model

[TABLE]

where $k=k\left(u\right)=\min\{y:Q(y,\lambda_{t,0})\geq u\}$ . For the Poisson DGP described above, $Y_{t}$ is stationary and ergodic, $\gamma_{\infty}\left(u,v\right):=\mathop{\mathrm{E}}\nolimits\left[\gamma_{1,\theta_{0}}\left(u,v\right)\right]$ satisfies Assumption 1. By the same argument Assumptions 2, 3D, 4C, 5 are fulfilled.

Assumption 3A and 3B are trivial. For Assumption 3C note that

[TABLE]

where

[TABLE]

The last expression can be iterated from $t-1$ to $t=1$ and because $\alpha_{1}<1$ the arithmetic progression sum of mean squares is bounded, as in the proof of Lemma 3.2 of Fokianos et al. (2009).

Assumption 4A, 4B and 6B are standard, see e.g. Andrews (1997) which adapts to Poisson model using Theorem 3.1 of Fokianos et al. (2009).

Assumption 6A is trivial, because there is no explanatory variables other than own past values.

9 Acknowledgements

We thank Juan Mora for useful comments. Support from the Ministerio Economia y Competitividad (Spain), grants ECO2012-31748, ECO2014-57007p, MDM 2014-0431, Comunidad de Madrid, MadEco-CM (S2015/HUM-3444), and Fundación Ramón Areces is gratefully acknowledged.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Andrews, D.W.K. (1997) A conditional Kolmogorov test, Econometrica 65, 1097- 1128.
2[2] Bai, J. (2003) Testing Parametric Conditional Distributions of Dynamic Models, Review of Economics and Statistics 85, 531-549.
3[3] Basu, D. and R. de Jong (2007). Dynamic Multinomial Ordered Choice with an Application to the Estimation of Monetary Policy Rules. Studies in Nonlinear Dynamics and Econometrics , 11, 1507-1507.
4[4] Burke, M. D., Csorgo M., Csorgo S. and P. Revesz (1978). Approximaiton of the empirical process whe parameters are estimated. Annals of Probability , 7, 790-810.
5[5] Cameron A.C. and P.K. Trivedi (1990) Regression-based tests for overdispersion in the Poisson model, Journal of Econometrics 46, 347-364.
6[6] Corradi, V. and R. Swanson (2006) Bootstrap conditional distribution test in the presence of dynamic misspecification, Journal of Econometrics 133, 779-806.
7[7] Czado, C., T. Gneiting and L. Held (2009). Predictive model assessment for count data. Biometrics , 65, 1254-1261.
8[8] Davis R. A., W. T. M. Dunsmuir and S. B. Streett (2003) Observation-Driven Models for Poisson Counts. Biometrika 90, 777-790.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

New goodness-of-fit diagnostics for conditional discrete response

Abstract

1 INTRODUCTION

Example 1** (Dynamic multinomial ordered choice model).**

Example 2** (Poisson Model).**

2 ALTERNATIVES TO PIT FOR DISCRETE DATA

3 PROPERTIES OF EMPIRICAL PROCESSES BASED ON THE NONRANDOMIZED

Lemma 1**.**

Lemma 2**.**

Assumption 1**.**

Lemma 3**.**

3.1 Local Alternatives

Assumption 2**.**

Lemma 4**.**

3.2 Parameter Estimation Effect

Assumption 3** (Parametric family).**

Lemma 5**.**

Assumption 4** (Parameter estimation).**

Theorem 1**.**

4 EMPIRICAL PROCESSES FOR DYNAMIC SPECIFICATION

Assumption 5**.**

Theorem 2**.**

5 BOOTSTRAP TESTS

Theorem 3**.**

Assumption 6**.**

Theorem 4**.**

6 APPLICATION AND SIMULATIONS

7 CONCLUSIONS

8 APPENDIX

8.1 Properties of the nonrandomized transform

Lemma A**.**

8.2 Functional weak convergence of discrete martingales

Theorem A**.**

8.3 Additional technical assumptions

Assumption A**.**

8.4 Proofs

Proof of Lemma A.

Proof of Lemma 1.

Proof of Lemma 2..

Proof of Lemma 3..

Proof of Lemma 4..

Proof of Lemma 5..

Proof of Theorem 1..

Proof of Theorem 2..

Proof of Theorem 4..

8.5 Checking assumptions for the Poisson model

9 Acknowledgements

Example 1 (Dynamic multinomial ordered choice model).

Example 2 (Poisson Model).

Lemma 1.

Lemma 2.

Assumption 1.

Lemma 3.

Assumption 2.

Lemma 4.

Assumption 3 (Parametric family).

Lemma 5.

Assumption 4 (Parameter estimation).

Theorem 1.

Assumption 5.

Theorem 2.

Theorem 3.

Assumption 6.

Theorem 4.

Lemma A.

Theorem A.

Assumption A.