Estimation and adaptive-to-model testing for regressions with diverging number of predictors
Falong Tan, Lixing Zhu

TL;DR
This paper develops a new test for parametric single-index models with a diverging number of predictors, analyzing estimator properties and constructing an adaptive test statistic suitable for high-dimensional settings.
Contribution
It introduces an adaptive-to-model residual empirical process and a martingale transformation for model checking in high-dimensional regressions with diverging predictors.
Findings
The test maintains good size and power in simulations.
Asymptotic properties are established under null and alternative hypotheses.
Estimator properties are characterized for diverging dimensions.
Abstract
The research described in this paper is motivated by model checking for parametric single-index models with diverging number of predictors. To construct a test statistic, we first study the asymptotic property of the estimators of involved parameters of interest under the null and alternative hypothesis when the dimension is divergent to infinity as the sample size goes to infinity. For the testing problem, we study an adaptive-to-model residual-marked empirical process as the basis for constructing a test statistic. By modifying the approach in the literature to suit the diverging dimension settings, we construct a martingale transformation. Under the null, local and global alternative hypothesis, the weak limits of the empirical process are derived and then the asymptotic properties of the test statistic are investigated. Simulation studies are carried out to examine the performance…
| a | n=100 | n=200 | n=400 | n=800 | |
|---|---|---|---|---|---|
| p=7 | p=10 | p=12 | p=16 | ||
| 0.0 | 0.0970 | 0.0905 | 0.0890 | 0.1020 | |
| 0.5 | 0.8650 | 0.9915 | 1.0000 | 1.0000 | |
| 0.0 | 0.0500 | 0.0530 | 0.0500 | 0.0505 | |
| 0.5 | 0.7770 | 0.9810 | 1.0000 | 1.0000 | |
| 0.0 | 0.0085 | 0.0105 | 0.0115 | 0.0130 | |
| 0.5 | 0.5620 | 0.9095 | 0.9975 | 1.0000 | |
| 0.0 | 0.0915 | 0.0995 | 0.1060 | 0.0985 | |
| 0.5 | 0.8675 | 0.9865 | 1.0000 | 1.0000 | |
| 0.0 | 0.0510 | 0.0470 | 0.0420 | 0.0495 | |
| 0.5 | 0.7825 | 0.9795 | 1.0000 | 1.0000 | |
| 0.0 | 0.0120 | 0.0090 | 0.0120 | 0.0100 | |
| 0.5 | 0.5290 | 0.9065 | 0.9990 | 1.0000 | |
| 0.0 | 0.1140 | 0.1220 | 0.0980 | 0.1190 | |
| 0.5 | 0.8850 | 0.9880 | 1.0000 | 1.0000 | |
| 0.0 | 0.0480 | 0.0590 | 0.0650 | 0.0490 | |
| 0.5 | 0.8110 | 0.9860 | 1.0000 | 1.0000 | |
| 0.0 | 0.0150 | 0.0100 | 0.0110 | 0.0090 | |
| 0.5 | 0.6190 | 0.9310 | 0.9970 | 1.0000 | |
| 0.0 | 0.0390 | 0.0010 | 0.0000 | 0.0000 | |
| 0.5 | 0.5490 | 0.2910 | 0.1760 | 0.0000 | |
| 0.0 | 0.0070 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.3900 | 0.0910 | 0.0180 | 0.0000 | |
| 0.0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.1220 | 0.0060 | 0.0020 | 0.0000 | |
| 0.0 | 0.0805 | 0.0950 | 0.1055 | 0.1060 | |
| 0.5 | 0.2240 | 0.2205 | 0.2420 | 0.2430 | |
| 0.0 | 0.0305 | 0.0300 | 0.0330 | 0.0310 | |
| 0.5 | 0.1460 | 0.1285 | 0.1445 | 0.0980 | |
| 0.0 | 0.0015 | 0.0020 | 0.0025 | 0.0025 | |
| 0.5 | 0.0420 | 0.0210 | 0.0225 | 0.0150 | |
| 0.0 | 0.0710 | 0.0755 | 0.0850 | 0.0830 | |
| 0.5 | 0.8170 | 0.9795 | 1.0000 | 1.0000 | |
| 0.0 | 0.0525 | 0.0430 | 0.0585 | 0.0475 | |
| 0.5 | 0.7690 | 0.9690 | 1.0000 | 1.0000 | |
| 0.0 | 0.0220 | 0.0170 | 0.0205 | 0.0170 | |
| 0.5 | 0.6510 | 0.9455 | 0.9995 | 1.0000 |
| a | n=100 | n=200 | n=400 | n=800 | |
|---|---|---|---|---|---|
| p=7 | p=10 | p=12 | p=16 | ||
| 0.0 | 0.1010 | 0.0925 | 0.1055 | 0.0900 | |
| 0.5 | 0.2550 | 0.5135 | 0.9190 | 1.0000 | |
| 0.0 | 0.0520 | 0.0465 | 0.0445 | 0.0515 | |
| 0.5 | 0.1445 | 0.3225 | 0.7550 | 1.0000 | |
| 0.0 | 0.0095 | 0.0090 | 0.0120 | 0.0070 | |
| 0.5 | 0.0460 | 0.1060 | 0.3485 | 0.9140 | |
| 0.0 | 0.0980 | 0.0990 | 0.0865 | 0.0930 | |
| 0.5 | 0.2630 | 0.5265 | 0.9240 | 1.0000 | |
| 0.0 | 0.0530 | 0.0480 | 0.0515 | 0.0495 | |
| 0.5 | 0.1760 | 0.3235 | 0.7350 | 0.9970 | |
| 0.0 | 0.0100 | 0.0060 | 0.0085 | 0.0105 | |
| 0.5 | 0.0470 | 0.1145 | 0.3580 | 0.9350 | |
| 0.0 | 0.1080 | 0.1170 | 0.1230 | 0.1000 | |
| 0.5 | 0.2560 | 0.3390 | 0.5160 | 0.7590 | |
| 0.0 | 0.0530 | 0.0590 | 0.0440 | 0.0700 | |
| 0.5 | 0.1470 | 0.2320 | 0.4080 | 0.6250 | |
| 0.0 | 0.0130 | 0.0130 | 0.0080 | 0.0130 | |
| 0.5 | 0.0450 | 0.1020 | 0.2010 | 0.4080 | |
| 0.0 | 0.0370 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.1950 | 0.0330 | 0.0020 | 0.0000 | |
| 0.0 | 0.0110 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.0790 | 0.0020 | 0.0000 | 0.0000 | |
| 0.0 | 0.0020 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.0110 | 0.0000 | 0.0000 | 0.0000 | |
| 0.0 | 0.0805 | 0.0830 | 0.0800 | 0.1095 | |
| 0.5 | 0.1630 | 0.1515 | 0.1825 | 0.1665 | |
| 0.0 | 0.0325 | 0.0350 | 0.0320 | 0.0330 | |
| 0.5 | 0.0755 | 0.0775 | 0.0940 | 0.0615 | |
| 0.0 | 0.0045 | 0.0015 | 0.0035 | 0.0035 | |
| 0.5 | 0.0155 | 0.0095 | 0.0125 | 0.0060 | |
| 0.0 | 0.0820 | 0.0725 | 0.0810 | 0.0745 | |
| 0.5 | 0.6765 | 0.9460 | 1.0000 | 1.0000 | |
| 0.0 | 0.0495 | 0.0500 | 0.0500 | 0.0535 | |
| 0.5 | 0.6035 | 0.9335 | 0.9995 | 1.0000 | |
| 0.0 | 0.0190 | 0.0165 | 0.0180 | 0.0210 | |
| 0.5 | 0.4660 | 0.8705 | 0.9980 | 1.0000 |
| a | n=100 | n=200 | n=400 | n=800 | |
|---|---|---|---|---|---|
| p=7 | p=10 | p=12 | p=16 | ||
| 0.00 | 0.0985 | 0.1050 | 0.1085 | 0.1090 | |
| 0.25 | 0.7130 | 0.9410 | 0.9955 | 1.0000 | |
| 0.00 | 0.0500 | 0.0455 | 0.0435 | 0.0450 | |
| 0.25 | 0.5970 | 0.8945 | 0.9980 | 1.0000 | |
| 0.00 | 0.0095 | 0.0090 | 0.0095 | 0.0090 | |
| 0.25 | 0.3470 | 0.7225 | 0.9840 | 1.0000 | |
| 0.00 | 0.0960 | 0.1055 | 0.1060 | 0.0960 | |
| 0.25 | 0.7190 | 0.9405 | 0.9975 | 1.0000 | |
| 0.00 | 0.0505 | 0.0420 | 0.0470 | 0.0495 | |
| 0.25 | 0.5940 | 0.8980 | 0.9945 | 1.0000 | |
| 0.00 | 0.0080 | 0.0125 | 0.0095 | 0.0115 | |
| 0.25 | 0.3310 | 0.7190 | 0.9705 | 0.9995 | |
| 0.00 | 0.1030 | 0.0980 | 0.1140 | 0.1210 | |
| 0.25 | 0.7180 | 0.9500 | 0.9970 | 1.0000 | |
| 0.00 | 0.0580 | 0.0600 | 0.0440 | 0.0570 | |
| 0.25 | 0.6160 | 0.8980 | 0.9970 | 1.0000 | |
| 0.00 | 0.0060 | 0.0150 | 0.0080 | 0.0070 | |
| 0.25 | 0.3870 | 0.7360 | 0.9780 | 1.0000 | |
| 0.00 | 0.0290 | 0.0010 | 0.0000 | 0.0000 | |
| 0.25 | 0.1590 | 0.0190 | 0.0030 | 0.0000 | |
| 0.00 | 0.0110 | 0.0000 | 0.0000 | 0.0000 | |
| 0.25 | 0.0590 | 0.0010 | 0.0000 | 0.0000 | |
| 0.00 | 0.0010 | 0.0000 | 0.0000 | 0.0000 | |
| 0.25 | 0.0140 | 0.0000 | 0.0000 | 0.0000 | |
| 0.00 | 0.0765 | 0.0810 | 0.0940 | 0.0970 | |
| 0.25 | 0.1135 | 0.1185 | 0.1400 | 0.1305 | |
| 0.00 | 0.0275 | 0.0310 | 0.0315 | 0.0340 | |
| 0.25 | 0.0730 | 0.0485 | 0.0745 | 0.0625 | |
| 0.00 | 0.0030 | 0.0020 | 0.0030 | 0.0010 | |
| 0.25 | 0.0055 | 0.0060 | 0.0080 | 0.0030 | |
| 0.00 | 0.0800 | 0.0735 | 0.0770 | 0.0765 | |
| 0.25 | 0.4580 | 0.7430 | 0.9795 | 0.9995 | |
| 0.00 | 0.0510 | 0.0505 | 0.0540 | 0.0490 | |
| 0.25 | 0.3840 | 0.6660 | 0.9465 | 1.0000 | |
| 0.00 | 0.0200 | 0.0225 | 0.0235 | 0.0240 | |
| 0.25 | 0.2590 | 0.5570 | 0.9040 | 0.9995 |
| a | n=100 | n=200 | n=400 | n=800 | |
|---|---|---|---|---|---|
| p=7 | p=10 | p=12 | p=16 | ||
| 0.00 | 0.1130 | 0.1000 | 0.0970 | 0.0955 | |
| 0.25 | 0.9825 | 1.0000 | 1.0000 | 1.0000 | |
| 0.00 | 0.0520 | 0.0460 | 0.0545 | 0.0490 | |
| 0.25 | 0.9525 | 1.0000 | 1.0000 | 1.0000 | |
| 0.00 | 0.0110 | 0.0090 | 0.0075 | 0.0105 | |
| 0.25 | 0.8680 | 0.9950 | 1.0000 | 1.0000 | |
| 0.00 | 0.1090 | 0.0970 | 0.0910 | 0.1090 | |
| 0.25 | 0.9805 | 0.9990 | 1.0000 | 1.0000 | |
| 0.00 | 0.0475 | 0.0490 | 0.0460 | 0.0555 | |
| 0.25 | 0.9605 | 0.9995 | 1.0000 | 1.0000 | |
| 0.00 | 0.0095 | 0.0115 | 0.0075 | 0.0090 | |
| 0.25 | 0.8700 | 0.9970 | 1.0000 | 1.0000 | |
| 0.00 | 0.0950 | 0.1130 | 0.1110 | 0.1040 | |
| 0.25 | 0.9960 | 1.0000 | 1.0000 | 1.0000 | |
| 0.00 | 0.0580 | 0.0540 | 0.0570 | 0.0540 | |
| 0.25 | 0.9690 | 0.9990 | 1.0000 | 1.0000 | |
| 0.00 | 0.0140 | 0.0170 | 0.0080 | 0.0150 | |
| 0.25 | 0.8730 | 0.9980 | 1.0000 | 1.0000 | |
| 0.00 | 0.0290 | 0.0010 | 0.0000 | 0.0000 | |
| 0.25 | 0.5680 | 0.2420 | 0.1330 | 0.0000 | |
| 0.00 | 0.0050 | 0.0000 | 0.0000 | 0.0000 | |
| 0.25 | 0.3670 | 0.0740 | 0.0120 | 0.0000 | |
| 0.00 | 0.0010 | 0.0000 | 0.0000 | 0.0000 | |
| 0.25 | 0.1060 | 0.0040 | 0.0000 | 0.0000 | |
| 0.00 | 0.0700 | 0.0910 | 0.0875 | 0.0985 | |
| 0.25 | 0.2420 | 0.2125 | 0.2680 | 0.2210 | |
| 0.00 | 0.0320 | 0.0295 | 0.0325 | 0.0380 | |
| 0.25 | 0.1145 | 0.1195 | 0.1410 | 0.1145 | |
| 0.00 | 0.0015 | 0.0045 | 0.0050 | 0.0035 | |
| 0.25 | 0.0335 | 0.0230 | 0.0220 | 0.0095 | |
| 0.00 | 0.0780 | 0.0805 | 0.0815 | 0.0830 | |
| 0.25 | 0.8645 | 0.9935 | 1.0000 | 1.0000 | |
| 0.00 | 0.0455 | 0.0560 | 0.0540 | 0.0625 | |
| 0.25 | 0.8405 | 0.9870 | 1.0000 | 1.0000 | |
| 0.00 | 0.0210 | 0.0195 | 0.0225 | 0.0195 | |
| 0.25 | 0.7285 | 0.9735 | 1.0000 | 1.0000 |
| a | n=100 | n=200 | n=400 | n=800 | |
|---|---|---|---|---|---|
| p=7 | p=10 | p=12 | p=16 | ||
| 0.00 | 0.1075 | 0.0965 | 0.0910 | 0.1035 | |
| 0.25 | 0.6185 | 0.8980 | 0.9955 | 1.0000 | |
| 0.00 | 0.0520 | 0.0490 | 0.0495 | 0.0570 | |
| 0.25 | 0.4895 | 0.8185 | 0.9925 | 1.0000 | |
| 0.00 | 0.0100 | 0.0085 | 0.0100 | 0.0115 | |
| 0.25 | 0.2505 | 0.5920 | 0.9450 | 0.9995 | |
| 0.00 | 0.0935 | 0.0935 | 0.1070 | 0.1055 | |
| 0.25 | 0.7005 | 0.9120 | 0.9965 | 1.0000 | |
| 0.00 | 0.0515 | 0.0425 | 0.0460 | 0.0445 | |
| 0.25 | 0.5600 | 0.8505 | 0.9940 | 1.0000 | |
| 0.00 | 0.0080 | 0.0100 | 0.0060 | 0.0100 | |
| 0.25 | 0.3180 | 0.6680 | 0.9665 | 1.0000 | |
| 0.00 | 0.1150 | 0.0910 | 0.1090 | 0.1050 | |
| 0.25 | 0.7080 | 0.9320 | 0.9990 | 1.0000 | |
| 0.00 | 0.0560 | 0.0480 | 0.0570 | 0.0430 | |
| 0.25 | 0.6230 | 0.9080 | 0.9960 | 1.0000 | |
| 0.00 | 0.0080 | 0.0120 | 0.0100 | 0.0090 | |
| 0.25 | 0.3810 | 0.7230 | 0.9820 | 1.0000 | |
| 0.00 | 0.0180 | 0.0010 | 0.0000 | 0.0000 | |
| 0.25 | 0.1220 | 0.0060 | 0.0000 | 0.0000 | |
| 0.00 | 0.0040 | 0.0000 | 0.0000 | 0.0000 | |
| 0.25 | 0.0470 | 0.0010 | 0.0000 | 0.0000 | |
| 0.00 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |
| 0.25 | 0.0070 | 0.0000 | 0.0000 | 0.0000 | |
| 0.00 | 0.1100 | 0.1020 | 0.0960 | 0.1110 | |
| 0.25 | 0.1420 | 0.1370 | 0.1550 | 0.1545 | |
| 0.00 | 0.0400 | 0.0410 | 0.0365 | 0.0390 | |
| 0.25 | 0.0710 | 0.0700 | 0.0610 | 0.0550 | |
| 0.00 | 0.0045 | 0.0035 | 0.0040 | 0.0035 | |
| 0.25 | 0.0140 | 0.0075 | 0.0065 | 0.0035 | |
| 0.00 | 0.1135 | 0.1045 | 0.1115 | 0.1240 | |
| 0.25 | 0.5275 | 0.8140 | 0.9860 | 0.9995 | |
| 0.00 | 0.0790 | 0.0760 | 0.0775 | 0.0750 | |
| 0.25 | 0.4625 | 0.7300 | 0.9610 | 1.0000 | |
| 0.00 | 0.0340 | 0.0345 | 0.0310 | 0.0305 | |
| 0.25 | 0.3175 | 0.6015 | 0.9295 | 0.9985 |
| a | n=100 | n=200 | n=400 | n=800 | |
|---|---|---|---|---|---|
| p=7 | p=10 | p=12 | p=16 | ||
| 0.0 | 0.1180 | 0.1190 | 0.1095 | 0.1060 | |
| 0.5 | 0.2255 | 0.3090 | 0.4805 | 0.7390 | |
| 0.0 | 0.0575 | 0.0550 | 0.0585 | 0.0530 | |
| 0.5 | 0.1295 | 0.1895 | 0.3030 | 0.5790 | |
| 0.0 | 0.0110 | 0.0135 | 0.0115 | 0.0120 | |
| 0.5 | 0.0325 | 0.0605 | 0.1155 | 0.2830 | |
| 0.0 | 0.1110 | 0.1075 | 0.0980 | 0.1010 | |
| 0.5 | 0.1335 | 0.1480 | 0.1580 | 0.1920 | |
| 0.0 | 0.0650 | 0.0535 | 0.0550 | 0.0550 | |
| 0.5 | 0.0755 | 0.0970 | 0.0835 | 0.1195 | |
| 0.0 | 0.0085 | 0.0140 | 0.0095 | 0.0120 | |
| 0.5 | 0.0205 | 0.0285 | 0.0180 | 0.0330 | |
| 0.0 | 0.1110 | 0.1160 | 0.1010 | 0.1180 | |
| 0.5 | 0.2370 | 0.3480 | 0.4730 | 0.6630 | |
| 0.0 | 0.0470 | 0.0560 | 0.0690 | 0.0510 | |
| 0.5 | 0.1310 | 0.2000 | 0.2760 | 0.4450 | |
| 0.0 | 0.0070 | 0.0100 | 0.0240 | 0.0100 | |
| 0.5 | 0.0430 | 0.0580 | 0.0930 | 0.1700 | |
| 0.0 | 0.0200 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.0980 | 0.0140 | 0.0030 | 0.0020 | |
| 0.0 | 0.0050 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.0210 | 0.0020 | 0.0000 | 0.0000 | |
| 0.0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |
| 0.5 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |
| 0.0 | 0.0940 | 0.0915 | 0.0985 | 0.1135 | |
| 0.5 | 0.1325 | 0.1455 | 0.1625 | 0.1455 | |
| 0.0 | 0.0445 | 0.0365 | 0.0410 | 0.0380 | |
| 0.5 | 0.0690 | 0.0765 | 0.0770 | 0.0545 | |
| 0.0 | 0.0050 | 0.0035 | 0.0020 | 0.0020 | |
| 0.5 | 0.0125 | 0.0090 | 0.0070 | 0.0040 | |
| 0.0 | 0.1015 | 0.1020 | 0.0995 | 0.1125 | |
| 0.5 | 0.2380 | 0.3745 | 0.5450 | 0.8265 | |
| 0.0 | 0.0615 | 0.0675 | 0.0670 | 0.0580 | |
| 0.5 | 0.1700 | 0.2750 | 0.4560 | 0.7725 | |
| 0.0 | 0.0240 | 0.0270 | 0.0290 | 0.0335 | |
| 0.5 | 0.1015 | 0.1655 | 0.3360 | 0.6260 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
Estimation and adaptive-to-model testing for regressions with diverging number of predictors
111Lixing Zhu is a Chair professor of Department of Mathematics at Hong Kong Baptist University, Hong Kong, China. He was supported by a grant from the University Grants Council of Hong Kong, Hong Kong, China.
Falong Tan and Lixing Zhu
Department of Mathematics, Hong Kong Baptist University, Hong Kong
Abstract
The research described in this paper is motivated by model checking for parametric single-index models with diverging number of predictors. To construct a test statistic, we first study the asymptotic property of the estimators of involved parameters of interest under the null and alternative hypothesis when the dimension is divergent to infinity as the sample size goes to infinity. For the testing problem, we study an adaptive-to-model residual-marked empirical process as the basis for constructing a test statistic. By modifying the approach in the literature to suit the diverging dimension settings, we construct a martingale transformation. Under the null, local and global alternative hypothesis, the weak limits of the empirical process are derived and then the asymptotic properties of the test statistic are investigated. Simulation studies are carried out to examine the performance of the test.
Key words: Adaptive-to-model test; Empirical process; Martingale transformation; Parametric single-index models; Sufficient dimension reduction.
1 Introduction
Regression modelling is a vital problem in regression analysis. One important step in regression modelling is to check the adequacy of a model that would be used in further analysis to prevent possible wrong conclusions. There are a number of proposals available in the literature, which will be reviewed later. However, there is an important issue that has not been well studied. We notice that in high dimensional data analysis, the dimension of the predictor vector is often large even though it is still small compared with the sample size . In this case, we often regard as a diverging number as goes to infinity. A relevant reference is Huber (1973) who considered a problem where goes to infinity at the rate of order .
In this paper, we focus on inference for parametric single-index models. Although they are in form generalized linear models, we do not use this name as generalized linear models have their own definitions in the literature. Let be a response variable associated with a -dimensional predictor vector . If is integrable, the regression function is well-defined. Let be a given parametric family of functions. The study herewith is motivated by checking whether belongs to or not. Thus the null hypothesis we want to test is that follows a parametric single-index model as
[TABLE]
where is the error term, is fixed, diverges as the sample size tends to infinity, and denotes the transposition.
We now review existing methodologies in the literature. Two major classes of tests are: locally smoothing tests and globally smoothing tests. Locally smoothing tests use nonparametric smoothing estimators to construct test statistics; see Härdle and Mammen (1993), Zheng (1996), Fan and Li (1996), Dette (1999), Fan and Huang (2001), Koul and Ni (2004), and Van Keilegom et al. (2008) as examples. Globally smoothing tests construct test statistics based on averages of functionals of empirical processes and then avoid nonparametric estimation. They are called globally smoothing tests as averaging is also a globally smoothing step. Examples include Bierens (1982, 1990), Stute (1997), Stute, Thies, and Zhu (1998), Stute et al. (1998), Khmadladze and Koul (2004).
All existing methods are limited to the fixed dimension settings. The extension to a diverging dimension case is by no means trivial. When the dimension is large, most existing tests, especially locally smoothing tests, perform badly. Stute and Zhu (2002) can be regarded as a dimension reduction-based test. A martingale transformation leads it to be asymptotically distribution-free. This test has been proved to be powerful in many cases, even when is large. But Stute and Zhu’s (2002) test is not omnibus, i.e., it fails to be consistent against all alternative hypotheses and thus is a directional test. Escanciano (2006) gave some detailed comments on this issue, and proposed, as well as Lavergne and Patilea (2008, 2012), tests that are based on projected covariates. Guo et al. (2016) did it also and put forward to a model adaptation notion in hypothesis testing. This innovative notion provides a deep insight into model checking for regressions and the adaptive-to-model approach can fully use the model structures under both the null and alternative hypothesis. Recently, with the help of sufficient dimension reduction techniques, Tan et al. (2017) generalized Stute and Zhu’s (2002) method and obtained an omnibus test which is asymptotically distribution-free and inherits the dimension reduction properties. It performs very well, but still requires the condition that is fixed. In this paper, we develop a consistent diagnostic test for checking the adequacy of a single-index model when the dimension of the predictor vector diverges to infinity as the sample size tends to infinity.
To make full use of the model structure under both the null hypothesis and the alternative hypothesis, we consider the following alternative model
[TABLE]
where and is an unknown smooth function and is a orthonormal matrix with an unknown with . Note that this is a more general model of (1.2) than the nonparametric model as it is a special case when is an orthonormal matrix with .
Similarly as Stute and Zhu (2002), we still use residual-marked empirical process and the martingale transformation to construct a test statistic when projected predictors vector is used. However, when the projected predictors vector under the null hypothesis is used to construct a test statistic as Stute and Zhu (2002) did, it cannot be an omnibus test. Stute et al (1998a) constructed a residual-marked empirical process by using the original predictors vector. When is divergent, the test severely suffers from the curse of dimensionality in theory. To alleviate these difficulties, we will adopt a model adaptation strategy as Tan et al (2017) did. It can adaptively uses projected predictors under the null and alternative hypothesis. Under the null, only one projected predictor is used like that in Stute and Zhu’s construction, while under the alternatives, it can automatically uses all projections on -dimensional unit sphere to guarantee the omnibus property. Although this idea seems workable, the theoretical investigation, due to the dimensionality divergence, becomes very complicated. There are no no relevant results in the literature about the convergence of residual-marked empirical process with diverging . Even when we can obtain its limiting Gaussian process, the shift term created by estimating the parameter of interest has no a simple formula so that we can easily motivate the martingale transformation construction proposed by Stute, Thies, and Zhu (1998) to make the test asymptotically distribution-free. This is a typical problem when is divergent, which does not happen when is fixed.
Therefore, the paper is then organized as follows. Section 2 contains the asymptotic properties of the ordinary least squares estimator in the diverging dimension setting. Based on this, we define an adaptive-to-model residual-marked empirical process as the basis of the proposed test statistic. Since sufficient dimension reduction theory plays a crucial role to achieve the adaptive-to-model property, we give a brief review in this section and give the study on the convergence rate of the relevant estimators. In Section 3, we present the limit of the adaptive-to-model empirical process under the null hypothesis and give the investigation for its asymptotics. Then we use a modified approach to define a martingale transformation because the shift term has no close form in the diverging dimension settings. The asymptotic properties of the martingale transformation-based innovation process under both the null and alternatives are studied. We also show that when is fixed, this transformation is equivalent to the Stute and Zhu’s (2002) martingale transformation. In Section 4, we give the test statistic for practical use and then several simulation studies are conducted. A real data example is analysed in Section 5 for illustration. Section 6 contains a discussion. Technical proofs are deferred to Appendix.
2 Adaptive-to-model residual-marked empirical process
2.1 Preliminary
Let be an i.i.d. sample with the same distribution as and let be the unpredictable part of given . Recall that . We want to test whether or not
[TABLE]
For estimating the unknown , we in this paper restrict ourselves to the ordinary least squares method. Let
[TABLE]
To analyze the asymptotic property of , define
[TABLE]
It is easy to see that if , we have . If , typically depends on the distribution of . Let . Then under the null hypothesis we have .
To study the asymptotic properties of as is divergent, we first give some notations and the regularity conditions postpone to Appendix. Suppose that is third differentiable with respective to . Let
[TABLE]
The matrix is used in the following matrix which will play a crucial role in deriving the asymptotic properties of :
[TABLE]
The next two results give the norm consistency of with respective to and the decomposition of \left(\begin{array}[]{c}\hat{\beta}_{n}-\tilde{\beta}_{0}\\ \hat{\theta}_{n}-\tilde{\theta}_{0}\\ \end{array}\right) into independent and identically distributed summands. This decomposition generalizes the results of White (1981) to the case where the dimension of the predictor vector diverges. For simplicity, we define hereafter , and .
Proposition 1**.**
Suppose that conditions (A1)-(A6) in Appendix hold. If , then is a norm consistent estimator of in the sense that , where denotes the Frobenius norm.
The convergence rate of order is in line of the results of the M-estimator that was obtained by Huber (1973) and Portnoy (1984) when the number of parameters diverges. For the asymptotic decomposition, we have the following result.
Proposition 2**.**
If and conditions (A1)-(A6) in Appendix hold, we then have
[TABLE]
Remark 1**.**
The rate or as seems slow. According to the arguments for proving Propositions 1 and 2 in Appendix, we can see that if follows a linear model, then and . Thus we can obtain the norm consistency of to and the asymptotic decomposition of under the conditions and , respectively. This condition is the same as that of Huber (1973) who only considered the linear model therein. Portnoy (1984, 1985) obtained the norm consistency and the asymptotic normality under weaker conditions again for linear models. However, his conditions are hard to check in practice what kinds of models, other than linear models, can satisfy. Further, extending their results to handle the parametric single-index models as we consider here is, to the best of our knowledge, still an open question.
2.2 Basic test statistic construction
Recall the null hypothesis:
[TABLE]
against the alternative hypothesis:
[TABLE]
where is an unknown smooth function and the orthonormal matrix is given in (1.2). We assume that under both the null and alternative hypothesis where is the central mean subspace such that . Under the null hypothesis, this is obvious. Under the alternative hypothesis, would not necessarily parallel to , but reasonably be a linear combination of all columns of the matrix . Thus the assumption is not restrictive.
Also recall and . Under the null hypothesis, and with . Therefore, we obtain that . Under the alternative hypothesis, we have . Then it follows that under the null hypothesis
[TABLE]
While under the alternative, by Lemma 1 of Escanciaco (2006), there exists an such that , where . Then it follows that
[TABLE]
Note that under the null we have and . Thus the quantity actually has the same form in both (2.2) and (2.3). Define an adaptive-to-model residual marked empirical process in the diverging dimension setting as below
[TABLE]
[TABLE]
where and are defined as before and is the sufficient dimension reduction estimator of with an estimated structural dimension of , which will be specified later. For , one can also use the integral over to define a test statistic.
To achieve the model adaptation property of the process, we need sufficient dimension reduction (SDR) techniques to identify the structural dimension and the matrix , when diverges to infinity. We give a brief review below on this topic.
2.3 Adaptive-to-model approach
In this methodology, we need to identify the dimension and the matrix . This can be done by using the methods in sufficient dimension reduction. We then give a brief description. Recall under the alternative hypothesis the model is as
[TABLE]
where and is an unknown smooth function and is a orthonormal matrix with . We can see that under both the null and alternative hypothesis, the conditional independence holds respectively:
[TABLE]
where means statistical independence. Define as the central mean subspace of with respect to (see, Cook and Li 2002) that is the intersection of all subspaces spanned by the columns of such that . The dimension of is called the structural dimension, denoted as . Under mild conditions, such a subspace always exists (see Cook and Li, 2002). If , then . Under the null hypothesis (1.1), and . Under the alternative (1.2), and . For simplicity, we assume throughout this paper that . Here is the central subspace of with respect to (see, Cook 1998).
There are several estimation proposals available in the literature. For instance, sliced inverse regression (SIR, Li (1991)), sliced average variance estimation (SAVE, Cook and Weisberg (1991)), minimum average variance estimation (MAVE, Xia et.al. (2002)), directional regression (DR, Li and Wang, (2007)), discretization-expectation estimation (DEE, Zhu, et al. (2010a)). All these methods assumed that is fixed. Zhu, Miao, and Peng (2006) first discussed the asymptotic properties of SIR when diverges to infinity. In this paper, we adapt cumulative slicing estimation (CSE, Zhu, Zhu, and Feng (2010b)) to identify the central subspace, which is similar to discretization-expectation estimation (DEE, Zhu, et al. (2010a)). This is because both of them are very easily implemented and easy to be extended to handle the case where the dimension grows to infinity.
The procedure of CSE is as follows. For simplicity, we assume for a moment. If the linearity condition (see Li, 1991) holds, it is easy to see that for any function . Theoretically, we obtain infinity amount of vectors in . Zhu et.al. (2010b) suggested a determining class of indicator functions to replace . Let . It follows that
[TABLE]
Define the target matrix
[TABLE]
where denotes the cumulative distribution function of . If the rank of is , then . Based on this, it is easy to obtain the sample version of . Let be the standardized and . The estimator of is given by
[TABLE]
If the structural dimension is given, an estimator of consists of the eigenvectors corresponding to the largest eigenvalues of . Throughout this paper, we assume that is fixed.
Yet we need a consistent estimator of as is usually unknown under the alternative hypothesis. Later we will see that even when is given, we still want a consistent estimator because we wish the test to have model adaptation property to fully use the dimension reduction structure under the null hypothesis. Inspired by Xia et al. (2015), we suggest a minimum ridge-type eigenvalue ratio estimator (MRER) to determine . Let and be the eigenvalues of the matrix and respectively. Since , it follows that
[TABLE]
Hence we estimate the structural dimension by
[TABLE]
Here is defined as [math] and the ridge is a positive constant. The following result shows that the consistency of MRER is adaptive to the underlying models, when equals to some appropriate constant. Its proof will be given in Appendix.
Proposition 3**.**
*Suppose that the regularity conditions of Theorem 3 in Zhu et al. (2010b) hold. Let be a matrix whose columns are the eigenvectors that are associated with the largest eigenvalues of . If , then
(1) under , we have and ;
(2) under , we have and .*
3 Main results
3.1 Basic properties of the process
First, we discuss the asymptotic properties of the process under the null hypothesis. Since the distributional limit theory becomes much simpler if we replace the estimators by their true values, we define the following process
[TABLE]
Put
[TABLE]
Then we have and where is the cumulate distribution function of . Obviously, is a nondecreasing and nonnegative function. Since is a centered residual cusum process, it is readily seen that
[TABLE]
By Theorem 2.11.22 in Van Der Vaart and Wellner (1996), we obtain that is asymptotically tight. If pointwisely in , it follows that
[TABLE]
in the space , where is a centred Gaussian process with the covariance function . Since is also nondecreasing and nonnegative, it follows that in distribution, where is a standard Brownian motion.
For composite model checks, the unknown parameters in should be replaced by their estimators, so we go back to as defined in (2.4). By Proposition 3, under the null hypothesis. Thus we only need to work on the event . Consequently, and can be rewritten as
[TABLE]
Under some regularity conditions stated in Appendix and on the event , we can show that under the null hypothesis
[TABLE]
uniformly in , where . A proof of (3.2) will be given in Appendix. Combined (3.2) with Proposition 2 and some elementary calculations, we have
[TABLE]
uniformly in . It is easy to see that the second term of the right hand side of (3.3) is also asymptotically tight. Altogether we then obtain the following result.
Theorem 3.1**.**
Suppose that the regularity conditions in Appendix hold. when , then under the null hypothesis, we have in distribution
[TABLE]
where is a zero mean Gaussian process with a covariance function that is the pointwise limit of as
[TABLE]
3.2 Martingale transformation
If is fixed, can be rewritten as in distribution and its covariance function can be specified. The shift term is brought out from the second term in (3.3). Stute, Thies, and Zhu (1998) first proposed a martingale transformation to eliminate in and then obtain a tractable limiting distribution of a functional of . This has become one of the basic methodologies in the area of model checking to derive asymptotically distribution-free tests. It was motivated by the Khmaladze martingale transformation in constructing convenient goodness of fit tests for hypothetical distribution functions (Khmaladze, 1982). There are a number of follow-up studies in the literature to extend this methodology to various high-dimensional models such as Khmadladze and Koul (2004) and Stute, Xu and Zhu (2008). However, when diverges as goes to infinity, the form of the shift term that would be a limit of can not be given specifically, as stated in the above theorem. The martingale transformation cannot directly target . We then bypass this difficulty by checking its shift term at the sample level. Note that the shift term comes from the second term in (3.2). This is because in the case with the fixed , is just its weak limit. Thus, we then target that term directly at the sample level.
Following Stute, Thies, and Zhu (1998) or Stute and Zhu (2002), recall that and . Let
[TABLE]
be the Radon-Nikodym derivative of with respect to . Next, define a matrix
[TABLE]
It can also be written as
[TABLE]
Mimicking the martingale transformation in Stute and Zhu (2002) at the sample level, we have
[TABLE]
Here we should assume that is nonsingular and the process should be either bounded variation or a Brownian motion.
Some elementary computation concludes that . Next, we discuss the approximation properties of . Note that
[TABLE]
and
[TABLE]
Combining these two formulas, we obtain that
[TABLE]
Therefore, is also an i.i.d. centered residual cusum process with a covariance function
[TABLE]
This means that admits the same limiting distribution as that of , i.e.,
[TABLE]
Consequently, we get rid of the annoying shift term and obtain the process whose supremum over all has a tractable limiting distribution. The assertions (3.5) and (3.6) will be justified in Appendix (Lemma 1).
The transformation obviously contains some unknown quantities and therefore needs to be substituted by their empirical analogues. For this, let and . It follows that
[TABLE]
Consequently, we have
[TABLE]
where . Conclude that
[TABLE]
Since depends on and on which we do not make any assumption rather than smoothness, they need to be estimated in a nonparametric way. For instance, we may adopt a standard Nadaraya-Watson estimator for :
[TABLE]
where is an univariate kernel function and is a bandwidth. Similarly for . Thus we obtain the empirical estimators and of and respectively:
[TABLE]
Finally, we can give an estimator of :
[TABLE]
where is the estimator of and is the empirical distribution function of . Making sure the columns of have the same direction as , we can assume and .
Theorem 3.2**.**
Suppose that is nonsingular and is bounded away from zero for all . If , under the null hypothesis and the regularity conditions in Appendix, we have
[TABLE]
in distribution in the space for any .
Note that we use in the process . In concrete data analysis, these matrices may be unbounded for large and thus the distributional behavior of the underlying process may become very unstable in the extreme right tails. These may severely damage the approximation accuracy of the test statistic based on all . Therefore, we restrict to compact intervals and obtain the convergence of in the space .
In a special case where the predictor follows a spherically contoured distribution or its extension, the elliptically contoured distribution, we can show that the calculations of the martingale transformation will become much simpler. The idea is similar to Stute and Zhu (2002). Without loss of generality, we only consider spherically contoured distributions. Here we shall assume the regression function does not depend on . Let be the derivative of with respective to . It follows that
[TABLE]
where is an orthonormal matrix with the first row (or ). Since the conditional expectation of the other components of given the first is zero, it follows that
[TABLE]
whence,
[TABLE]
Note that is a matrix with rank and is singular when . Thus the martingale transformation can not apply directly. However, if we go back to (3.2) and set
[TABLE]
then (3.2) can be rewritten as
[TABLE]
Conclude that the new and become the real-valued
[TABLE]
Clearly, Theorem 3.2 can be applied to these new functions.
Hall and Li (1993) shown that, if as , expectation over a large number of random variables behaves more or less like expectation over the multivariate normal distribution. Note that and multivariate normal distribution is elliptically-contoured. Consequently, even when is not multivariate normal distributed, can be viewed as expectation on multivariate normal distribution and then the martingale transformation can apply to the real-valued and in practice for large .
3.3 The properties under the alternative hypothesis
Now we discuss the asymptotic properties of under a sequence of local alternatives converging to the null hypothesis at a parametric rate . Consider
[TABLE]
where , is a random variable with zero mean and satifies . To derive the asymptotic distribution of under , we need the asymptotic properties of and , when diverges to infinity.
Proposition 4**.**
Assume the regularity conditions of Theorem 3 in Zhu et al. (2010b) hold. Let be an eigenvector associating with the largest eigenvalues of , then we have, under , and .
Next, we derive the norm consistency of with respective to and a asymptotical decomposition of under . Here and as mentioned before.
Proposition 5**.**
Suppose the regularity conditions in Appendix and (3.8) hold. If , then is a norm consistent estimator for with . Moreover, if , we have
[TABLE]
The following theorem states the asymptotic results under various alternatives.
Theorem 3.3**.**
*Suppose the regularity conditions in Appendix hold. If ,
(1) under the global alternative , we have in probability*
[TABLE]
*where is some nonzero function;
(2) under the local alternative , we have in distribution*
[TABLE]
where is a zero-mean Gaussian process given by (3.1) and and are the uniform limit of , , respectively which are as follows
[TABLE]
These results show that under the global alternative, the process diverges to infinity at the rate of order and under the local alternatives distinct from the null at the rate of order , the process converges to a stochastic process. Thus, the test that is based on this process can detect such alternatives.
4 Numerical studies
4.1 Test statistics in practical use
In this subsection, we use the Cramr-von Mises (CM) functional to construct test statistic. Consider
[TABLE]
where is the empirical distribution function of , . According to Theroem 3.2 and the Extended Continuous Mapping Theorem (see Theorem 1.11.1 in Van Der Vaart and Wellner (1996)), we obtain, under the null,
[TABLE]
where is a standard Brownian motion and is the pointwise limit of . Since in distribution, it follows that
[TABLE]
Consequently, we consider
[TABLE]
Here we use as an estimator of . Therefore, we obtain
[TABLE]
In the homoscedastic models, is free of and thus we can estimate it by
[TABLE]
Now we also have and thus it can be estimated by . Consiquently, becomes
[TABLE]
For , as suggested by Stute and Zhu (2002), we take quantile of in the simulation studies.
4.2 Numerical studies
In this subsection we conduct some simulation studies to examine the performance of the proposed test in this paper. From the results, we set with , as used in Fan and Peng (2004). As there are no relevant tests dealing with the case with divergent dimension, we give comparisons with some existing tests that were developed with fixed dimension as for practical use, they would be workable.
- Stute and Zhu’s (2002) test is given by
[TABLE]
where
[TABLE]
For , one can refer to their paper for detail.
- Bierens (1982) proposed an integrated conditional moment (ICM) test which is based on the following statistic:
[TABLE]
where .
- Escanciano’s (2006) test statistic is defined as
[TABLE]
with the critical value determination by the wild bootstrap. More details can be found in Escanciano (2006).
- Zheng (1996) proposed a locally smoothing test whose statistic is given by
[TABLE]
- An adaptive-to-model test defined in Guo et. al. (2016) with the test statistic:
[TABLE]
Here we use the kernel function and the bandwidth as in Guo et. al. (2016) and is a sufficient dimension estimate of with an estimated structural dimension of .
The significance levels are set to be , , and . The simulation results are based on the averages of replications. In the following simulation studies, corresponds to the null while to the alternatives.
1. The data are generated from the following models:
[TABLE]
where , and with . The predictors are i.i.d. from and is Guassian white noise with variance . is a high-frequency/oscilating model and the other three are low-frequency models. In and , the structural dimension equals under both the null and the alternative, while, in and , the structural dimension is under the alternatives.
The simulation results are reported in Tables 1 to 4. We can see that both and maintain the significance levels very well. The empirical sizes of are also very close to the significance levels, but slightly more unstable in some cases. can only maintain the significance level when it is . can maintain the significance levels occasionally, but generally, it is conservative with smaller sizes. is the worst among these tests in both the significance level maintenance and power performance. According to our experience, when is smaller than , could work well. The powers of , , and are all very high for models , and . But ’s power grows slightly slower than the other three, while, for model , beats the other competitors. These may validate again the empirical experience in this area that locally smoothing tests perform better for high frequency/oscillating models, while globally smoothing tests work better for low frequency models. Nevertheless, , a representative of locally smoothing tests, has very low power for model . This is because severely suffers from the dimensionality problem, while uses a dimension reduction technique to greatly alleviate the curse of dimensionality.
[TABLE]
The null models are all linear in 1. We then consider nonlinear hypothetical models in the next simulation study.
2. The data are generated from the following models
[TABLE]
where and with , is , and is independent of .
We report the empirical sizes and powers in Tables 5 and 6. For model , The conclusions are very similar to those in 1. For model , we can see that the empirical sizes of , and are very close to the significance levels, while and can only control the level of . is still the worst one. The empirical powers of and are higher than the other competitors, while ’s empirical powers grow very slow in this case. This would confirm the theoretical result that is not an omnibus test.
[TABLE]
Therefore, overall, the proposed test in this paper performs well and can detect different alternatives. Further, the dimension of predictors has less negative impact on its performance.
4.3 A real data example
In this subsection we analyze the baseball salary data set that can be obtain through the website http://www4.stat.ncsu.edu/~boos/var.select/baseball.html. This data set contains 337 Major League Baseball players on the salary from the year 1992 and 16 performance measures from the year 1991. The performance measures are : Batting average, : On-base percentage, : runs, : hits, : doubles, : triples, : home runs, : runs batted in, : walks, : strike-outs, : stolen bases, and : errors; and : Indicator of free agency eligibility, : Indicators of free agent in 1991/2, : Indicators of arbitration eligibility, and : Indicators of arbitration in 1991/2. The dummy variables measure the freedom of movement of a player to another team. For easy interpretation, we standardize all variables separately. To obtain the regression relationship between and the performance measures , we first test for a linear regression model by the proposed test because the dimension and in the simplest case with linear model, the proposed test can theoretically handle . The value of the test statistic is with the -value equal to . Since the -value is small although it is larger than, say, , an often used significance level, we may consider a more plausible model to better fit this dataset. Hence we apply the dimension reduction techniques. Recalling in Section 2.3, we claimed that to estimate the central subspace, the CSE method is used. The estimated structural dimension of this datset is . This means that may be conditionally independent of given the projected covariate where
[TABLE]
is the first direction obtained by CSE. The scatter plot of against is presented in Figure 1(a). It indicates that a linear regression model for is not reasonable.
[TABLE]
To further exhaust possible projected covariates, we consider the second projected covariate obtained by CSE. The scatter plot of against is presented in Figure 2.
[TABLE]
This figure shows that the second projected covariate has no information in predicting the response , as the plot along is almost invariable. This means that the projection of the data onto the subspace would already contain most of regression information of . Figure 1(a) seems to suggest a quadratic polynomial of to fit the data. Hence we use the following regression mode:
[TABLE]
Figure 1(b) adds the fitted curve on the scatter plot. The value of the test statistic and the -value is about . Therefore the above regression model is plausible.
5 Discussions
In this paper, we investigate model checking for regressions when the dimension of predictors diverges to infinity as the sample size tends to infinity. Three remarkable features are worthwhile to discuss. First, although the empirical process is similar to that in Stute and Zhu (2002), it involves much more difficult estimation issues in the construction procedure of test statistics. Second, as the Khmaladze martingale transformation has become an important methodology for model checking as its asymptotically distribution-free property, we suggest another way to construct the transformation, rather than directly targeting the limit of shift terms in the fixed dimension cases. The transformed process still has the same limiting Gaussian process as that with fixed dimension. This provides us an easy way to handle the cases with divergent dimension. Third, the model adaptation property shows its advantage in maintaining the significance level and enhancing power performance. The research also leaves some unsolved topics. An important topic is about how to relax the condition on the diverging rate of the dimension. In this paper, we cannot have faster rate than in general although for some special regression models such as linear models, it can achieve This is mainly because of technical difficulties in estimation. Thus, to attack this problem, we need to improve the asymptotic properties of involved estimators. This is beyond the scope of this paper and deserves further studies.
6 Appendix
6.1 Regularity Conditions
In this subsection we present some regularity conditions for the theoretical results. Although these conditions may not be the weakest possible, they make technical arguments easy to understand. In the following, always stands for a constant which may be different in different cases.
First, we give some regularity conditions for the norm consistency of to and the decomposition of \left(\begin{array}[]{c}\hat{\beta}_{n}-\tilde{\beta}_{0}\\ \hat{\theta}_{n}-\tilde{\theta}_{0}\\ \end{array}\right).
(A1) The matrix is positive definite and satisfies the following condition
[TABLE]
where and are the smallest and largest eigenvalues of , respectively.
The first to third derivatives of the regression function satisfy the conditions:
(A2) , ; ;
(A3) with for all ;
(A4) with for all and ;
(A5) with for all , and ;
(A6) with for all , and ;
where is the -th component of , is the -element of , and is the -element of .
Condition (A1) is similar to the regularity condition on the Fisher information matrix proposed by Fan and Peng (2004), where the Fisher information matrix plays the same role in deriving the asymptotic theory as the matrix does here. Conditions (A2)-(A6) are standard for nonlinear least squares estimation, see, e.g., Jennrich (1969) and White (1981).
Next, we present some regularity condition for the convergence of the adaptive-to-model residual marked empirical process.
(B1) There exists a constant such that if , then
[TABLE]
where denotes the symmetric difference of two sets. This condition is given by Zhu (1993) who showed the existence of distributions satisfying this condition.
(B2) If , uniformly in .
(B3) For any unit non-random vector , there exist -integrable functions such that
[TABLE]
where is given by (3.8) and is the conditional density of given .
6.2 Lemmas
In this subsection we present some Lemmas that will be needed in proving the propositions and theorems. Since we consider the empirical process with diverging dimension, there are no relevant results available in the literature. Thus, in the following Lemmas, we give the results about the convergence rate of the involved empirical process, which are different from the classical ones with fixed dimension in the literature.
Lemma 1**.**
Suppose is nonsingular for all , then we have
[TABLE]
that is, (3.5) and thus (3.6) hold.
Proof. Assume . By the definition of and the Fubini Theorem, the left-hand side of (3.5) equals
[TABLE]
It is easy to see that the sum of the last three terms is equal to zero. Thus we complete the proof.
Next we consider the convergence rate of the involved empirical processes in the diverging dimension. Let be a fixed function and be a VC-class of functions with a VC-index which may depend on . Let be the covering number of with respective to the seminorm . See e.g. Pollard (1984) for details. Suppose for any and and
[TABLE]
Set . By some elementary calculations, we have
[TABLE]
whence
[TABLE]
Lemma 2**.**
Let and be positive sequences. If and for large enough, then
[TABLE]
where and are constants which may depend on .
Proof. The proof is similar to Theorem 37 in Chapter 2 of Pollard (1984) and Theorem 3.1 in Zhu (1993). Since for large enough, by the formula (30) in Chapter 2 of Pollard (1984), we have
[TABLE]
Conditionally on . Using the same argument as that for proving the inequality (31) in Chapter 2 of Pollard (1984), it follows that
[TABLE]
Taking expectation, we obtain that
[TABLE]
Consequently,
[TABLE]
Therefore, we complete the proof.
Lemma 3**.**
If and , then we have
[TABLE]
where .
Proof. Fix and let . We have
[TABLE]
For every term in the last sum, we use Lemma 2. Let
[TABLE]
It is easy to see that is a VC-class with VC-index . By Theorem 2.6.7 in Van Der Vaart and Wellner (1996), we obtain that
[TABLE]
where is a universal constant. Set and . Lemma 2 leads to
[TABLE]
whence
[TABLE]
Therefore, we obtain the result.
Lemma 4**.**
Let be a permissible class of functions with and for all . Then
[TABLE]
For the definition of “a permissible class of functions ”, one can refer to Chapter 2 of Pollard (1984) for details.
Proof. This Lemma is a slightly modified version of Lemma 33 in Chapter 2 of Pollard (1984) as we need the result with diverging . But the proof can be very similar and thus is omitted here.
Lemma 5**.**
Let and be positive real valued sequences. Suppose for all and for large enough. If , then
[TABLE]
where and are constants which may depend on .
Proof. The proof is similar to Theorem 37 in Chapter 2 of Pollard (1984) and Theorem 3.1 of Zhu (1993). Since , similar to the proof for Lemma 2, we have
[TABLE]
Conditionally on , we obtain
[TABLE]
Take expectation to obtain
[TABLE]
The last inequality is due to Lemma 4. Altogether we complete the proof.
Lemma 6**.**
Suppose and condition (B1) hold. If and , then we have
[TABLE]
Proof. Fix and set . Since , by condition (B1), it suffices to prove
[TABLE]
Let
[TABLE]
Then it is easy to see that
[TABLE]
Since and are both VC-classes with the VC-index and respectively, by Theorem 2.6.7 in Van Der Vaart and Wellner (1996), we obtain that
[TABLE]
whence
[TABLE]
Let . It follows that
[TABLE]
Let , and . Since
[TABLE]
by Lemma 5, we have
[TABLE]
Since , it follows that which completes our proof.
Next, we consider the convergence rate of the following process
[TABLE]
Lemma 7**.**
Let . Suppose conditions (A2) and (B1) hold. If , then
[TABLE]
Proof. Fix and set , , and . Similar to the proof for Lemma 6, it suffices to prove
[TABLE]
By the same argument for proving Lemma 3, we obtain
[TABLE]
For every term in the last sum, we use Lemma 5 to derive the result. Let
[TABLE]
Then we have
[TABLE]
where is a universal constant free of .
Recall and . By conditions (A2) and (B1), we have
[TABLE]
[TABLE]
By Lemma 5, we obtain that
[TABLE]
whence
[TABLE]
Since , it follows that the right-hand side of the above inequality tends to zero. Hence we complete the proof.
In the next lemma, we give the convergence rate of the kernel regression function estimator . Let be a sample from , be the density function of with a support and . Suppose that
[TABLE]
It follows that uniformly in . Set
[TABLE]
Then . Here is the kernel function and is a bandwidth.
Lemma 8**.**
Suppose the above conditions hold. If , then we have
[TABLE]
Proof. Let , and be the -th component of , and respectively. For fixed , set , , and . Then
[TABLE]
Define
[TABLE]
Without loss of generality, assume and . By the arguments in Example 38 of Chapter 2 of Pollard (1984), we obtain that
[TABLE]
where and are free of . Let . Then
[TABLE]
Since
[TABLE]
and
[TABLE]
for large enough, Lemma 5 yields that
[TABLE]
whence
[TABLE]
Since , it is easy to see that the right-hand side of the inequality tends to zero. Thus . By the arguments for proving Lemma 3.3 of Zhu and Fang (1996), we obtain that
[TABLE]
Consequently,
[TABLE]
Thus we obtain the first result. For the second, note that
[TABLE]
and
[TABLE]
Combining these with the uniformly boundedness of , the proof is concluded.
6.3 Proofs of The Propositions and Theorems
For simplicity of notations, we consider a parametric family of functions . Let and
Proof of Proposition 1. Let and . Then it suffices to show that there is a root of such that . Applying the results in (6.3.4) of Ortega and Rheinboldt (1970), it in turn needs to show that for where is some large enough constant.
Let with , and . Using Taylor’s expansion we obtain
[TABLE]
where lie between and and
[TABLE]
Then we have . Since , it follows that
[TABLE]
Thus . Recall that , and . Then we decompose the term as follows
[TABLE]
By condition (A2), we obtain that
[TABLE]
It follows that . By the same argument, we have
[TABLE]
Therefore . For the first term of , by the triangle inequality and condition (A6), we have
[TABLE]
For the second term of , we have
[TABLE]
By the same argument for the third and forth term of , we obtain that . Therefore
[TABLE]
If be large enough, for any , we have
[TABLE]
Thus our result follows from (6.3.4) of Ortega and Rheinboldt (1970).
If follows a linear regression model, then and . According to the proof of Proposition 1, we can obtain the norm consistency of under the weaker condition .
Proof of Proposition 2. We use the same notations as those in the proof of Proposition 1. Let . Then . Applying Taylor’s expansion around , we obtain
[TABLE]
where lies between and . Therefore
[TABLE]
Note that
[TABLE]
Following the same arguments in Proposition 1, we obtain that and . Since , it follows that
[TABLE]
Because , the result follows. Indeed,
If , it is easy to see that . Consequently,
[TABLE]
Therefore only the convergence rate is needed to obtain the result in Proposition 2.
Proof of Proposition 3. (1) Suppose that and for . Similar to the arguments of Theorem 2.2 in Zhu and Fang (1996), we have
[TABLE]
By Theorem 3 in Zhu et al. (2010b), we obtain that is asymptotically normal. Thus . Following the arguments of Lemma 1 in Tan et al. (2017), we obtain . Again, by Theorem 2.2 in Zhu and Fang (1996), we obtain
[TABLE]
Note that and under . Then we have
[TABLE]
Since and , it follows that .
(2) Note that is free of under . The proof is concluded from the argument for proving (1).
Proof of Theorem 3.1. Under the null hypothesis, we have . Thus we need only work on the event . It follows that and we can rewrite as
[TABLE]
Let . Then we obtain that
[TABLE]
where lies between and . For the third term in , note that
[TABLE]
Therefore uniformly in . For , recall that . Then we decompose as follows
[TABLE]
For the second term in , by Lemma 3, we have
[TABLE]
Conclude that
[TABLE]
Since uniformly in , by Proposition 2, we have
[TABLE]
Therefore, we obtain that
[TABLE]
Now we consider the term . It can be decomposed as follow
[TABLE]
By Lemma 6, we obtain that uniformly in . For the second term , let
[TABLE]
By Lemma 7, we have
[TABLE]
Therefore, we derive that
[TABLE]
Let . By condition (B1), it is easy to see that
[TABLE]
Consequently,
[TABLE]
It follows that uniformly in .
Similar to the term , we obtain that uniformly in . Combining these with (6.1), we obtain that
[TABLE]
It is easy to see that the first and second terms of the right-hand side of (6.2) are asymptotically tight.
Now we consider the convergence of finite-dimensional distributions. Let where
[TABLE]
For any , we have
[TABLE]
Since
[TABLE]
and
[TABLE]
it follows that . For , it is easy to see that
[TABLE]
Since
[TABLE]
it follows that . Hence
For the covariance matrix , we only need to consider . It is easy to see that
[TABLE]
Thus . Since , it follows that satisfies the conditions of Lindeberg-Feller Central limit theorem. Hence convergence of the finite-dimensional distributions holds. All together we have
[TABLE]
where is a zero mean Gaussian process with covariance function . Hence we complete the proof.
Proof of Theorem 3.2. Similar to the proof for Theorem 3.1, we only need to work on the event . Let
[TABLE]
On the event , we have and then . Consequently, can be rewritten as
[TABLE]
Next we divide the whole proof of Theorem 3.2 into three parts.
(I) First, to prove that uniformly in . Recall that
[TABLE]
Since
[TABLE]
by the same arguments in the proof of Theorem 3.1, we obtain that
[TABLE]
uniformly in . The two integrals in and differ by
[TABLE]
It equals
[TABLE]
where and both lie between and . Recall that
[TABLE]
Then the two integrals differ by
[TABLE]
Since , it follows that uniformly in .
(II) Second, to prove uniformly in . Indeed,
[TABLE]
Putting
[TABLE]
it follows that
[TABLE]
By the uniformly boundedness of , we have the sequence is asymptotically tight. According to Lemma 3.4 in Stute, Thies, and Zhu (1998) and the arguments thereafter, we obtain that uniformly in . For , since both and depend on , we rewrite and as and respectively and define
[TABLE]
By the boundedness of and Condition (B1), we obtain that . By Lemma 8, we show that
[TABLE]
Combining this with the uniformly boundedness of , we obtain tends to zero in probability.
(III) Finally, to prove uniformly in .
[TABLE]
Since
[TABLE]
by the same argument in Theorem 3.1, we obtain that uniformly in . For the integrals in , note that the two integrals differ by
[TABLE]
Since , similar to the arguments in Lemma 6, the difference between the two integrals in tends to zero. Hence uniformly in . All together we conclude that
[TABLE]
in distribution.
Proof of Proposition 4. Let , , , , and . Then the space and the space . If we show that is asymptotically normal for any unit vector , the result of this proposition follows from the exact arguments for proving Proposition 3.
We now prove the above asymptotic normality. Under , we have
[TABLE]
where is the conditional distribution of given . By Taylor’s expansion, we derive
[TABLE]
Here lies between and and is the conditional density function of given . Therefore,
[TABLE]
Note that . Consequently,
[TABLE]
By Theorem 3 in Zhu et al. (2010b), we have is asymptotically normal. By condition (B3) in Appendix, is also asymptotically normal.
Proof of Proposition 5. The proof is similar to that for proving Propositions 1 and 2 with and .
Proof of Theorem 3.3. (1) Under , Proposition 1 asserts that . Thus we only need work on the event . It follows that .
Putting
[TABLE]
and
[TABLE]
Following the arguments in Theorem 3.2, we obtain that
[TABLE]
where
[TABLE]
and is the cumulative distribution function of . Consider
[TABLE]
Since
[TABLE]
it follows that . For the two integrals in , we have
[TABLE]
Therefore, we obtain that
[TABLE]
Note that
[TABLE]
It follows that
[TABLE]
where
[TABLE]
Therefore, we obtain that
[TABLE]
where is an nonzero function.
(2) We use the same notations as in the arguments of Theorem 3.2. Under the local alternatives (3.8), by Proposition 3, we have . Thus we just work on this event . Hence and .
Following the same arguments for Theorem 3.2, we obtain that
[TABLE]
Next, we consider . Recall that
[TABLE]
Under , we have
[TABLE]
Then . For the integrals in , since
[TABLE]
by the same arguments for Theorem 3.2, we have
[TABLE]
Hence we obtain that .
To complete the proof, it remains to derive the asymptotic distribution of . Under the alternatives, note that
[TABLE]
It follows that
[TABLE]
By Glivenko-Cantelli Theorem, we have
[TABLE]
Since and
[TABLE]
we conclude that
[TABLE]
where is a zero-mean Gaussian process given by (3.6).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bierens, H. J. (1982). Consistent model specification tests. Journal of Econometrics , 20 , 105-134.
- 2[2]
- 3[3] Bierens, H. J. (1990). A consistent conditional moment test of functional form. Ecomometrica , 58 , 1443-1458.
- 4[4]
- 5[5] Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics. New York: Wiley.
- 6[6]
- 7[7] Cook, R. D. and Weisberg, S. (1991). Discussion of Sliced inverse regression for dimension reduction, by K. C. Li. Journal of the American Statistical Association , 86 , 316-342.
- 8[8]
