A copula approach for dependence modeling in multivariate nonparametric   time series

Natalie Neumeyer; Marek Omelka; Sarka Hudecova

arXiv:1705.07605·math.ST·December 11, 2018·J. Multivar. Anal.

A copula approach for dependence modeling in multivariate nonparametric time series

Natalie Neumeyer, Marek Omelka, Sarka Hudecova

PDF

TL;DR

This paper introduces a copula-based method for modeling dependence in multivariate nonparametric time series, accounting for covariates affecting mean and variance but assuming stable innovation distributions.

Contribution

It develops nonparametric and semiparametric estimators for the copula of innovations, demonstrating their asymptotic equivalence to unobserved innovation-based estimators.

Findings

01

Copula estimators are asymptotically consistent.

02

Method performs well in simulations.

03

Application to real data illustrates practical utility.

Abstract

This paper is concerned with modeling the dependence structure of two (or more) time-series in the presence of a (possible multivariate) covariate which may include past values of the time series. We assume that the covariate influences only the conditional mean and the conditional variance of each of the time series but the distribution of the standardized innovations is not influenced by the covariate and is stable in time. The joint distribution of the time series is then determined by the conditional means, the conditional variances and the marginal distributions of the innovations, which we estimate nonparametrically, and the copula of the innovations, which represents the dependency structure. We consider a nonparametric as well as a semiparametric estimator based on the estimated residuals. We show that under suitable assumptions these copula estimators are asymptotically…

Tables5

Table 1. Table 1. Estimation for Clayton copula with normal marginals (100 multiples of bias, SD and RMSE)

			$n = 200$			$n = 500$			$n = 1000$
Model	$τ$	estim	bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
Known innovations	0.25	${\hat{θ}}_{n}^{(i k, o r)}$	-0.03	3.25	3.25	0.10	2.47	2.47	0.12	1.86	1.87
	0.25	${\hat{θ}}_{n}^{(p l, o r)}$	0.51	3.00	3.04	0.35	2.15	2.18	0.24	1.69	1.70
	0.50	${\hat{θ}}_{n}^{(i k, o r)}$	0.01	2.64	2.64	0.06	2.03	2.03	0.07	1.52	1.52
	0.50	${\hat{θ}}_{n}^{(p l, o r)}$	0.09	2.47	2.47	0.08	1.84	1.85	0.04	1.39	1.39
	0.75	${\hat{θ}}_{n}^{(i k, o r)}$	0.01	1.58	1.58	0.05	1.19	1.19	0.02	0.89	0.89
	0.75	${\hat{θ}}_{n}^{(p l, o r)}$	-0.28	1.48	1.50	-0.17	1.10	1.11	-0.12	0.80	0.81
1	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.08	4.66	4.66	-0.22	2.97	2.97	-0.16	2.06	2.06
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.62	4.15	4.19	0.07	2.62	2.62	-0.02	1.82	1.82
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.46	3.94	3.97	-0.41	2.48	2.51	-0.25	1.74	1.76
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.90	3.59	3.70	-0.81	2.25	2.39	-0.55	1.60	1.69
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.04	2.45	2.66	-0.85	1.55	1.77	-0.59	1.07	1.22
	0.75	${\hat{θ}}_{n}^{(p l)}$	-3.00	2.66	4.01	-2.23	1.59	2.74	-1.57	1.15	1.94
2	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.43	4.78	4.79	-0.05	2.93	2.92	0.07	2.08	2.08
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.26	4.30	4.31	0.25	2.58	2.59	0.15	1.90	1.90
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.91	3.93	4.03	-0.24	2.40	2.41	-0.09	1.71	1.72
	0.50	${\hat{θ}}_{n}^{(p l)}$	-1.50	3.62	3.92	-0.57	2.21	2.29	-0.36	1.60	1.64
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.96	2.63	3.27	-0.70	1.52	1.68	-0.39	1.05	1.12
	0.75	${\hat{θ}}_{n}^{(p l)}$	-4.63	3.19	5.62	-2.14	1.84	2.82	-1.27	1.16	1.72
3	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.43	4.81	4.83	-0.09	2.91	2.91	0.03	2.09	2.09
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.24	4.37	4.38	0.19	2.56	2.57	0.11	1.90	1.90
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.93	3.97	4.07	-0.32	2.41	2.43	-0.16	1.72	1.72
	0.50	${\hat{θ}}_{n}^{(p l)}$	-1.52	3.70	4.00	-0.66	2.20	2.30	-0.46	1.61	1.67
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.85	2.61	3.20	-0.82	1.53	1.73	-0.53	1.04	1.16
	0.75	${\hat{θ}}_{n}^{(p l)}$	-4.39	3.05	5.35	-2.25	1.78	2.86	-1.46	1.14	1.85
4	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.49	4.85	4.87	-0.09	2.93	2.93	0.02	2.10	2.10
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.13	4.37	4.37	0.14	2.58	2.59	0.06	1.90	1.90
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.82	3.99	4.07	-0.25	2.40	2.41	-0.12	1.73	1.73
	0.50	${\hat{θ}}_{n}^{(p l)}$	-1.54	3.70	4.01	-0.80	2.22	2.36	-0.53	1.60	1.69
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.22	2.57	2.84	-0.49	1.48	1.56	-0.28	1.04	1.08
	0.75	${\hat{θ}}_{n}^{(p l)}$	-3.43	2.76	4.40	-1.93	1.65	2.54	-1.20	1.10	1.62

Table 2. Table 2. Estimation for Frank copula with normal marginals (100 multiples of bias, SD and RMSE)

			$n = 200$			$n = 500$			$n = 1000$
Model	$τ$	estim	bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
Known innovations	0.25	${\hat{θ}}_{n}^{(i k, o r)}$	-0.01	3.16	3.16	-0.05	2.33	2.33	-0.14	1.70	1.71
	0.25	${\hat{θ}}_{n}^{(p l, o r)}$	0.04	3.16	3.15	-0.03	2.32	2.32	-0.12	1.70	1.70
	0.50	${\hat{θ}}_{n}^{(i k, o r)}$	-0.02	2.37	2.37	-0.01	1.73	1.73	-0.09	1.28	1.28
	0.50	${\hat{θ}}_{n}^{(p l, o r)}$	0.00	2.34	2.34	-0.02	1.72	1.72	-0.08	1.27	1.27
	0.75	${\hat{θ}}_{n}^{(i k, o r)}$	-0.02	1.18	1.18	0.00	0.87	0.87	-0.03	0.64	0.64
	0.75	${\hat{θ}}_{n}^{(p l, o r)}$	-0.13	1.17	1.17	-0.07	0.87	0.87	-0.07	0.64	0.64
1	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.23	4.54	4.54	-0.11	2.82	2.82	-0.05	1.92	1.92
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.12	4.52	4.52	-0.05	2.81	2.81	-0.03	1.90	1.90
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.49	3.46	3.50	-0.32	2.18	2.20	-0.22	1.43	1.44
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.47	3.40	3.43	-0.30	2.15	2.17	-0.21	1.42	1.43
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.97	1.87	2.11	-0.69	1.15	1.34	-0.53	0.74	0.91
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.22	1.84	2.21	-0.81	1.16	1.41	-0.60	0.75	0.96
2	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.28	4.48	4.49	-0.15	2.78	2.79	-0.21	1.88	1.89
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.17	4.47	4.47	-0.12	2.77	2.77	-0.19	1.88	1.89
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.77	3.44	3.53	-0.29	2.13	2.14	-0.24	1.41	1.43
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.75	3.40	3.48	-0.31	2.10	2.12	-0.24	1.40	1.42
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.65	2.20	2.75	-0.66	1.18	1.35	-0.38	0.75	0.84
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.90	2.20	2.91	-0.78	1.18	1.41	-0.43	0.75	0.87
3	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.33	4.53	4.54	-0.17	2.77	2.77	-0.24	1.89	1.91
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.23	4.53	4.53	-0.14	2.75	2.75	-0.22	1.89	1.90
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.83	3.48	3.58	-0.37	2.09	2.12	-0.32	1.42	1.45
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.81	3.44	3.53	-0.38	2.06	2.10	-0.32	1.41	1.44
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.62	2.15	2.70	-0.77	1.14	1.37	-0.51	0.76	0.92
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.86	2.14	2.84	-0.89	1.14	1.44	-0.57	0.77	0.96
4	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.37	4.56	4.57	-0.16	2.79	2.80	-0.22	1.90	1.91
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.26	4.54	4.54	-0.13	2.79	2.79	-0.20	1.90	1.91
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.76	3.48	3.56	-0.30	2.13	2.15	-0.25	1.43	1.46
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.73	3.43	3.50	-0.31	2.11	2.13	-0.25	1.42	1.44
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.11	2.05	2.34	-0.48	1.15	1.24	-0.30	0.76	0.81
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.33	2.01	2.41	-0.58	1.14	1.28	-0.35	0.76	0.83

Table 3. Table 3. Estimation for Gumbel copula with normal marginals (100 multiples of bias, SD and RMSE)

			$n = 200$			$n = 500$			$n = 1000$
Model	$τ$	estim	bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
Known innovations	0.25	${\hat{θ}}_{n}^{(i k, o r)}$	0.01	3.19	3.19	0.13	2.43	2.44	0.08	1.88	1.88
	0.25	${\hat{θ}}_{n}^{(p l, o r)}$	0.44	3.01	3.04	0.38	2.37	2.40	0.24	1.81	1.82
	0.50	${\hat{θ}}_{n}^{(i k, o r)}$	0.02	2.58	2.58	0.11	1.96	1.97	0.02	1.49	1.49
	0.50	${\hat{θ}}_{n}^{(p l, o r)}$	0.24	2.42	2.43	0.27	1.89	1.91	0.12	1.44	1.44
	0.75	${\hat{θ}}_{n}^{(i k, o r)}$	0.02	1.48	1.48	0.06	1.12	1.12	0.00	0.84	0.84
	0.75	${\hat{θ}}_{n}^{(p l, o r)}$	-0.06	1.35	1.36	0.02	1.05	1.05	-0.03	0.78	0.78
1	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.36	4.76	4.78	0.06	3.06	3.05	-0.09	2.06	2.06
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.24	4.68	4.68	0.37	2.92	2.94	0.08	2.01	2.01
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.56	3.92	3.96	-0.17	2.45	2.46	-0.22	1.69	1.70
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.36	3.83	3.84	-0.10	2.35	2.35	-0.20	1.65	1.66
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.85	2.36	2.50	-0.52	1.42	1.51	-0.49	1.01	1.12
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.35	2.32	2.69	-0.84	1.36	1.60	-0.73	0.99	1.22
2	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.16	4.58	4.58	0.02	2.91	2.91	0.04	2.10	2.10
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.49	4.42	4.45	0.32	2.86	2.88	0.20	2.03	2.04
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.66	3.77	3.82	-0.14	2.36	2.36	-0.09	1.67	1.68
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.50	3.61	3.64	-0.09	2.30	2.30	-0.05	1.62	1.62
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.61	2.50	2.97	-0.52	1.43	1.52	-0.32	0.99	1.04
	0.75	${\hat{θ}}_{n}^{(p l)}$	-2.37	2.52	3.46	-0.95	1.45	1.73	-0.55	0.98	1.13
3	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.18	4.57	4.57	0.01	2.93	2.92	0.02	2.11	2.11
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.46	4.41	4.43	0.31	2.87	2.88	0.18	2.03	2.03
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.66	3.73	3.78	-0.18	2.36	2.37	-0.16	1.69	1.70
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.50	3.59	3.62	-0.13	2.31	2.32	-0.13	1.63	1.64
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.52	2.48	2.90	-0.58	1.41	1.53	-0.42	0.98	1.07
	0.75	${\hat{θ}}_{n}^{(p l)}$	-2.20	2.44	3.29	-0.98	1.40	1.71	-0.64	0.96	1.15
4	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.26	4.60	4.60	-0.06	2.97	2.97	0.04	2.12	2.12
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.30	4.47	4.47	0.19	2.89	2.89	0.18	2.04	2.05
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.63	3.79	3.84	-0.13	2.36	2.37	-0.11	1.69	1.69
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.56	3.63	3.67	-0.16	2.31	2.32	-0.13	1.65	1.65
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.83	2.38	2.52	-0.29	1.40	1.43	-0.21	0.97	0.99
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.51	2.35	2.79	-0.71	1.41	1.57	-0.45	0.95	1.05

Table 4. Table 4. Estimation for normal copula with normal marginals (100 multiples of bias, SD and RMSE)

			$n = 200$			$n = 500$			$n = 1000$
Model	$τ$	estim	bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
Known innovations	0.25	${\hat{θ}}_{n}^{(i k, o r)}$	-0.02	3.13	3.13	-0.05	2.32	2.31	-0.03	1.78	1.77
	0.25	${\hat{θ}}_{n}^{(p l, o r)}$	0.38	2.99	3.02	0.22	2.19	2.20	0.13	1.66	1.67
	0.50	${\hat{θ}}_{n}^{(i k, o r)}$	-0.01	2.44	2.44	-0.04	1.81	1.81	-0.02	1.39	1.39
	0.50	${\hat{θ}}_{n}^{(p l, o r)}$	0.32	2.26	2.28	0.19	1.67	1.68	0.12	1.27	1.27
	0.75	${\hat{θ}}_{n}^{(i k, o r)}$	-0.01	1.36	1.36	-0.02	1.01	1.01	-0.01	0.77	0.77
	0.75	${\hat{θ}}_{n}^{(p l, o r)}$	-0.04	1.23	1.23	-0.03	0.91	0.91	-0.01	0.69	0.69
1	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.29	4.65	4.66	-0.07	2.83	2.83	-0.15	1.99	2.00
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.35	4.49	4.50	0.19	2.72	2.72	0.02	1.89	1.89
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.48	3.67	3.70	-0.23	2.22	2.23	-0.25	1.56	1.58
	0.50	${\hat{θ}}_{n}^{(p l)}$	0.00	3.40	3.40	-0.05	2.08	2.08	-0.13	1.44	1.44
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.78	2.17	2.30	-0.53	1.27	1.38	-0.47	0.88	1.00
	0.75	${\hat{θ}}_{n}^{(p l)}$	-0.94	2.02	2.23	-0.64	1.19	1.35	-0.52	0.81	0.96
2	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.34	4.39	4.40	-0.12	2.80	2.80	-0.10	1.94	1.94
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.38	4.21	4.22	0.22	2.72	2.72	0.10	1.83	1.83
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.70	3.47	3.54	-0.25	2.20	2.21	-0.16	1.53	1.54
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.20	3.21	3.22	-0.01	2.06	2.06	-0.01	1.40	1.40
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.54	2.25	2.73	-0.59	1.31	1.43	-0.34	0.86	0.93
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.80	2.14	2.80	-0.71	1.23	1.43	-0.39	0.79	0.88
3	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.38	4.41	4.42	-0.15	2.80	2.81	-0.13	1.95	1.96
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.33	4.23	4.24	0.18	2.72	2.73	0.06	1.83	1.83
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.71	3.48	3.55	-0.32	2.19	2.21	-0.22	1.52	1.53
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.21	3.20	3.21	-0.08	2.05	2.06	-0.07	1.39	1.39
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.45	2.19	2.63	-0.70	1.29	1.46	-0.43	0.87	0.97
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.70	2.07	2.67	-0.81	1.21	1.46	-0.48	0.79	0.93
4	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.34	4.40	4.41	-0.15	2.81	2.81	-0.11	1.96	1.97
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.30	4.24	4.25	0.16	2.72	2.72	0.07	1.84	1.84
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.69	3.47	3.53	-0.27	2.20	2.22	-0.18	1.54	1.55
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.26	3.23	3.24	-0.09	2.07	2.07	-0.07	1.41	1.41
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.82	2.19	2.34	-0.35	1.29	1.34	-0.22	0.87	0.89
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.14	2.08	2.37	-0.52	1.23	1.33	-0.31	0.81	0.86

Table 5. Table 5. Estimation for Student copula with normal marginals (100 multiples of bias, SD and RMSE)

			$n = 200$			$n = 500$			$n = 1000$
Model	$τ$	estim	bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
Known innovations	0.25	${\hat{θ}}_{n}^{(i k, o r)}$	-0.28	3.53	3.53	0.00	2.68	2.68	0.07	1.99	1.99
	0.25	${\hat{θ}}_{n}^{(p l, o r)}$	0.03	3.48	3.48	0.23	2.61	2.62	0.21	1.97	1.98
	0.50	${\hat{θ}}_{n}^{(i k, o r)}$	-0.19	2.81	2.82	-0.01	2.14	2.14	0.05	1.60	1.60
	0.50	${\hat{θ}}_{n}^{(p l, o r)}$	0.05	2.66	2.66	0.19	2.00	2.00	0.17	1.51	1.52
	0.75	${\hat{θ}}_{n}^{(i k, o r)}$	-0.10	1.62	1.62	-0.01	1.23	1.23	0.02	0.93	0.93
	0.75	${\hat{θ}}_{n}^{(p l, o r)}$	-0.16	1.46	1.47	-0.02	1.09	1.09	0.02	0.83	0.83
1	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.25	4.93	4.93	-0.18	3.30	3.30	-0.11	2.28	2.28
	0.25	${\hat{θ}}_{n}^{(p l)}$	0.24	4.96	4.96	0.08	3.32	3.32	0.00	2.27	2.27
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.48	3.95	3.97	-0.34	2.62	2.64	-0.24	1.81	1.83
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.17	3.82	3.82	-0.18	2.57	2.57	-0.20	1.74	1.75
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.79	2.33	2.46	-0.64	1.56	1.68	-0.49	1.06	1.17
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.13	2.22	2.48	-0.83	1.47	1.69	-0.66	0.99	1.19
2	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.61	4.99	5.03	-0.20	3.23	3.24	0.02	2.22	2.22
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.21	4.98	4.98	0.03	3.18	3.18	0.15	2.19	2.20
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.89	4.01	4.11	-0.35	2.62	2.64	-0.08	1.79	1.79
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.80	3.86	3.94	-0.24	2.45	2.46	-0.01	1.69	1.69
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.66	2.57	3.06	-0.70	1.55	1.70	-0.30	1.06	1.10
	0.75	${\hat{θ}}_{n}^{(p l)}$	-2.37	2.48	3.42	-0.99	1.44	1.75	-0.46	0.97	1.07
3	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.59	5.01	5.05	-0.24	3.22	3.23	-0.01	2.23	2.23
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.21	4.97	4.98	-0.01	3.18	3.18	0.12	2.20	2.20
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.90	4.07	4.16	-0.43	2.59	2.63	-0.14	1.79	1.79
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.79	3.88	3.96	-0.33	2.44	2.46	-0.08	1.68	1.69
	0.75	${\hat{θ}}_{n}^{(i k)}$	-1.60	2.61	3.06	-0.76	1.55	1.73	-0.39	1.06	1.13
	0.75	${\hat{θ}}_{n}^{(p l)}$	-2.20	2.48	3.31	-1.05	1.43	1.77	-0.56	0.97	1.12
4	0.25	${\hat{θ}}_{n}^{(i k)}$	-0.63	5.03	5.07	-0.23	3.26	3.27	0.00	2.22	2.22
	0.25	${\hat{θ}}_{n}^{(p l)}$	-0.28	4.97	4.97	-0.01	3.22	3.22	0.12	2.19	2.20
	0.50	${\hat{θ}}_{n}^{(i k)}$	-0.91	4.06	4.16	-0.38	2.61	2.64	-0.09	1.78	1.78
	0.50	${\hat{θ}}_{n}^{(p l)}$	-0.81	3.82	3.91	-0.32	2.45	2.47	-0.07	1.68	1.68
	0.75	${\hat{θ}}_{n}^{(i k)}$	-0.97	2.51	2.69	-0.42	1.55	1.61	-0.17	1.05	1.06
	0.75	${\hat{θ}}_{n}^{(p l)}$	-1.47	2.34	2.76	-0.70	1.44	1.60	-0.33	0.96	1.02

Equations352

Y_{j i} = m_{j} (X_{i}) + σ_{j} (X_{i}) ε_{j i}, i = 1, \dots, n, j = 1, \dots, k,

Y_{j i} = m_{j} (X_{i}) + σ_{j} (X_{i}) ε_{j i}, i = 1, \dots, n, j = 1, \dots, k,

P (Y_{1 i} \leq y_{1}, \dots, Y_{k i} \leq y_{k} ∣ X_{i} = x) = P (ε_{1 i} \leq z_{1}, \dots, ε_{k i} \leq z_{k}) = C (F_{1 ε} (z_{1}), \dots, F_{k ε} (z_{k})),

P (Y_{1 i} \leq y_{1}, \dots, Y_{k i} \leq y_{k} ∣ X_{i} = x) = P (ε_{1 i} \leq z_{1}, \dots, ε_{k i} \leq z_{k}) = C (F_{1 ε} (z_{1}), \dots, F_{k ε} (z_{k})),

Y_{1 i} = m_{1} (X_{i}) + σ_{1} (X_{i}) ε_{1 i}, Y_{2 i} = m_{2} (X_{i}) + σ_{2} (X_{i}) ε_{2 i}, i = 1, \dots, n,

Y_{1 i} = m_{1} (X_{i}) + σ_{1} (X_{i}) ε_{1 i}, Y_{2 i} = m_{2} (X_{i}) + σ_{2} (X_{i}) ε_{2 i}, i = 1, \dots, n,

C(u_{1},u_{2})=F_{\varepsilon}\big{(}F_{1\varepsilon}^{-1}(u_{1}),F_{2\varepsilon}^{-1}(u_{2})\big{)},\quad(u_{1},u_{2})\in[0,1]^{2}.

C(u_{1},u_{2})=F_{\varepsilon}\big{(}F_{1\varepsilon}^{-1}(u_{1}),F_{2\varepsilon}^{-1}(u_{2})\big{)},\quad(u_{1},u_{2})\in[0,1]^{2}.

ε_{j i} = \frac{Y _{j i} - m _{j} ( X _{i} )}{σ _{j} ( X _{i} )}, i = 1, \dots, n, j = 1, 2,

ε_{j i} = \frac{Y _{j i} - m _{j} ( X _{i} )}{σ _{j} ( X _{i} )}, i = 1, \dots, n, j = 1, 2,

\min_{\boldsymbol{\beta}=(\beta_{\mathbf{i}})_{\mathbf{i}\in\mathbb{I}}}\,\sum_{\ell=1}^{n}\Big{[}Y_{j\ell}-\sum_{\mathbf{i}\in\mathbb{I}}\beta_{\mathbf{i}}\,\psi_{\mathbf{i},\mathbf{h}_{n}}\big{(}\boldsymbol{X}_{\ell}-\mathbf{x}\big{)}\Big{]}^{2}K_{\mathbf{h}_{n}}(\boldsymbol{X}_{\ell}-\mathbf{x}).

\min_{\boldsymbol{\beta}=(\beta_{\mathbf{i}})_{\mathbf{i}\in\mathbb{I}}}\,\sum_{\ell=1}^{n}\Big{[}Y_{j\ell}-\sum_{\mathbf{i}\in\mathbb{I}}\beta_{\mathbf{i}}\,\psi_{\mathbf{i},\mathbf{h}_{n}}\big{(}\boldsymbol{X}_{\ell}-\mathbf{x}\big{)}\Big{]}^{2}K_{\mathbf{h}_{n}}(\boldsymbol{X}_{\ell}-\mathbf{x}).

K_{\mathbf{h}_{n}}(\boldsymbol{X}_{\ell}-\mathbf{x})=\prod_{k=1}^{d}\tfrac{1}{h_{n}^{(k)}}\,k\Big{(}\tfrac{X_{\ell k}-x_{k}}{h_{n}^{(k)}}\Big{)},

K_{\mathbf{h}_{n}}(\boldsymbol{X}_{\ell}-\mathbf{x})=\prod_{k=1}^{d}\tfrac{1}{h_{n}^{(k)}}\,k\Big{(}\tfrac{X_{\ell k}-x_{k}}{h_{n}^{(k)}}\Big{)},

σ_{j}^{2} (x) = s_{j} (x) - m_{j}^{2} (x),

σ_{j}^{2} (x) = s_{j} (x) - m_{j}^{2} (x),

∥ f ∥_{ℓ + δ} = i \in I (d, ℓ) max x \in J sup ∣ D^{i} f (x) ∣ + i . = ℓ i \in I ( d , ℓ ) max x \neq = x ^{'} x , x ^{'} \in J sup \frac{∣ D ^{i} f ( x ) - D ^{i} f ( x ^{'} ) ∣}{∥ x - x ^{'} ∥ ^{δ}},

∥ f ∥_{ℓ + δ} = i \in I (d, ℓ) max x \in J sup ∣ D^{i} f (x) ∣ + i . = ℓ i \in I ( d , ℓ ) max x \neq = x ^{'} x , x ^{'} \in J sup \frac{∣ D ^{i} f ( x ) - D ^{i} f ( x ^{'} ) ∣}{∥ x - x ^{'} ∥ ^{δ}},

\widetilde{C}_{n}(u_{1},u_{2})=\widehat{F}_{\widehat{\varepsilon}}\Big{(}\widehat{F}_{1\widehat{\varepsilon}}^{-1}(u_{1}),\widehat{F}_{2\widehat{\varepsilon}}^{-1}(u_{2})\Big{)},

\widetilde{C}_{n}(u_{1},u_{2})=\widehat{F}_{\widehat{\varepsilon}}\Big{(}\widehat{F}_{1\widehat{\varepsilon}}^{-1}(u_{1}),\widehat{F}_{2\widehat{\varepsilon}}^{-1}(u_{2})\Big{)},

\widehat{F}_{\widehat{\varepsilon}}(y_{1},y_{2})=\frac{1}{W_{n}}\sum_{i=1}^{n}w_{ni}\,\mathbbm{1}\big{\{}\widehat{\varepsilon}_{1i}\leq y_{1},\widehat{\varepsilon}_{2i}\leq y_{2}\big{\}},

\widehat{F}_{\widehat{\varepsilon}}(y_{1},y_{2})=\frac{1}{W_{n}}\sum_{i=1}^{n}w_{ni}\,\mathbbm{1}\big{\{}\widehat{\varepsilon}_{1i}\leq y_{1},\widehat{\varepsilon}_{2i}\leq y_{2}\big{\}},

\widehat{F}_{j\widehat{\varepsilon}}(y)=\frac{1}{W_{n}}\sum_{i=1}^{n}w_{ni}\,\mathbbm{1}\big{\{}\widehat{\varepsilon}_{ji}\leq y\big{\}},\quad j=1,2,

\widehat{F}_{j\widehat{\varepsilon}}(y)=\frac{1}{W_{n}}\sum_{i=1}^{n}w_{ni}\,\mathbbm{1}\big{\{}\widehat{\varepsilon}_{ji}\leq y\big{\}},\quad j=1,2,

C_{n}^{(or)}(u_{1},u_{2})=\widehat{F}_{\varepsilon}\Big{(}\widehat{F}_{1\varepsilon}^{-1}(u_{1}),\widehat{F}_{2\varepsilon}^{-1}(u_{2})\Big{)},

C_{n}^{(or)}(u_{1},u_{2})=\widehat{F}_{\varepsilon}\Big{(}\widehat{F}_{1\varepsilon}^{-1}(u_{1}),\widehat{F}_{2\varepsilon}^{-1}(u_{2})\Big{)},

\max_{j,k\in\{1,2\}}\sup_{y_{1},y_{2}\in\mathbb{R}}\big{|}F_{\varepsilon}^{(j,k)}(y_{1},y_{2})\big{|}(1+|y_{j}|)(1+|y_{k}|)\big{|}<\infty.

\max_{j,k\in\{1,2\}}\sup_{y_{1},y_{2}\in\mathbb{R}}\big{|}F_{\varepsilon}^{(j,k)}(y_{1},y_{2})\big{|}(1+|y_{j}|)(1+|y_{k}|)\big{|}<\infty.

\qquad\lim_{u\to 0_{+}}\big{(}1+F_{j\varepsilon}^{-1}(u)\big{)}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}=0\quad\text{ and }\quad\lim_{u\to 1_{-}}\big{(}1+F_{j\varepsilon}^{-1}(u)\big{)}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}=0.

\qquad\lim_{u\to 0_{+}}\big{(}1+F_{j\varepsilon}^{-1}(u)\big{)}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}=0\quad\text{ and }\quad\lim_{u\to 1_{-}}\big{(}1+F_{j\varepsilon}^{-1}(u)\big{)}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}=0.

x_{0}, x_{i} sup σ_{j}^{2} (x_{0}) σ_{j}^{2} (x_{i}) f_{X_{0}, X_{i}} (x_{0}, x_{i}) \leq B,

x_{0}, x_{i} sup σ_{j}^{2} (x_{0}) σ_{j}^{2} (x_{i}) f_{X_{0}, X_{i}} (x_{0}, x_{i}) \leq B,

\displaystyle\sup_{\mathbf{x}_{0},\mathbf{x}_{i}}\big{|}m_{j}(\mathbf{x}_{0})m_{j}(\mathbf{x}_{i})\big{|}\,\sigma_{j}(\mathbf{x}_{0})\sigma_{j}(\mathbf{x}_{i})f_{\boldsymbol{X}_{0},\boldsymbol{X}_{i}}(\mathbf{x}_{0},\mathbf{x}_{i})\leq B,

n h_{n}^{2 p + 2} (lo g n)^{D} = o (1), n h_{n}^{3 d + 2 δ} (lo g n)^{- D} \to \infty

n h_{n}^{2 p + 2} (lo g n)^{D} = o (1), n h_{n}^{3 d + 2 δ} (lo g n)^{- D} \to \infty

\max_{j,k\in\{1,2\}}\sup_{u_{1},u_{2}\in[0,1]}\Big{|}C^{(j,k)}(u_{1},u_{2})\,f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u_{j})\big{)}\,f_{k\varepsilon}\big{(}F_{k\varepsilon}^{-1}(u_{k})\big{)}\\ +C^{(j)}(u_{1},u_{2})\,f^{\prime}_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u_{j})\big{)}\mathbbm{1}\{j=k\}\Big{|}\Big{(}1+\big{|}F_{j\varepsilon}^{-1}(u_{j})\big{|}\Big{)}\Big{(}1+\big{|}F_{k\varepsilon}^{-1}(u_{k})\big{|}\Big{)}<\infty,

\max_{j,k\in\{1,2\}}\sup_{u_{1},u_{2}\in[0,1]}\Big{|}C^{(j,k)}(u_{1},u_{2})\,f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u_{j})\big{)}\,f_{k\varepsilon}\big{(}F_{k\varepsilon}^{-1}(u_{k})\big{)}\\ +C^{(j)}(u_{1},u_{2})\,f^{\prime}_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u_{j})\big{)}\mathbbm{1}\{j=k\}\Big{|}\Big{(}1+\big{|}F_{j\varepsilon}^{-1}(u_{j})\big{|}\Big{)}\Big{(}1+\big{|}F_{k\varepsilon}^{-1}(u_{k})\big{|}\Big{)}<\infty,

C^{(j,k)}(u_{1},u_{2})=O\Big{(}\tfrac{1}{u_{j}^{2\,\eta}(1-u_{j})^{2\,\eta}u_{k}^{2\,\eta}(1-u_{k})^{2\,\eta}}\Big{)},

C^{(j,k)}(u_{1},u_{2})=O\Big{(}\tfrac{1}{u_{j}^{2\,\eta}(1-u_{j})^{2\,\eta}u_{k}^{2\,\eta}(1-u_{k})^{2\,\eta}}\Big{)},

\sup_{(u_{1},u_{2})\in[0,1]^{2}}\Big{|}\sqrt{n}\,\big{[}\widetilde{C}_{n}(u_{1},u_{2})-C_{n}^{(or)}(u_{1},u_{2})\big{]}\Big{|}=o_{P}(1).

\sup_{(u_{1},u_{2})\in[0,1]^{2}}\Big{|}\sqrt{n}\,\big{[}\widetilde{C}_{n}(u_{1},u_{2})-C_{n}^{(or)}(u_{1},u_{2})\big{]}\Big{|}=o_{P}(1).

G_{C} (u_{1}, u_{2}) = B_{C} (u_{1}, u_{2}) - C^{(1)} (u_{1}, u_{2}) B_{C} (u_{1}, 1) - C^{(2)} (u_{1}, u_{2}) B_{C} (1, u_{2}),

G_{C} (u_{1}, u_{2}) = B_{C} (u_{1}, u_{2}) - C^{(1)} (u_{1}, u_{2}) B_{C} (u_{1}, 1) - C^{(2)} (u_{1}, u_{2}) B_{C} (1, u_{2}),

\mathsf{E}\,\big{[}B_{C}(u_{1},u_{2})B_{C}(u_{1}^{\prime},u_{2}^{\prime})\big{]}=C(u_{1}\wedge u_{1}^{\prime},u_{2}\wedge u_{2}^{\prime})-C(u_{1},u_{2})\,C(u_{1}^{\prime},u_{2}^{\prime})\,.

\mathsf{E}\,\big{[}B_{C}(u_{1},u_{2})B_{C}(u_{1}^{\prime},u_{2}^{\prime})\big{]}=C(u_{1}\wedge u_{1}^{\prime},u_{2}\wedge u_{2}^{\prime})-C(u_{1},u_{2})\,C(u_{1}^{\prime},u_{2}^{\prime})\,.

θ_{n}^{(ik)} = τ^{- 1} (τ_{n}),

θ_{n}^{(ik)} = τ^{- 1} (τ_{n}),

τ (θ) = 4 \int_{0}^{1} \int_{0}^{1} C (u_{1}, u_{2}; θ) d C (u_{1}, u_{2}; θ) - 1

τ (θ) = 4 \int_{0}^{1} \int_{0}^{1} C (u_{1}, u_{2}; θ) d C (u_{1}, u_{2}; θ) - 1

\sqrt{n}\,\big{(}\widehat{\theta}_{n}^{(ik)}-\theta\big{)}\xrightarrow[n\rightarrow\infty]{d}\mathsf{N}\Big{(}0,\tfrac{\sigma_{\tau}^{2}}{[\tau^{\prime}(\theta)]^{2}}\Big{)},\quad\text{where}\quad\sigma_{\tau}^{2}=\mathsf{var}\Big{\{}8\,C(U_{11},U_{21};\theta)-4\,U_{11}-4\,U_{21}\Big{\}},

\sqrt{n}\,\big{(}\widehat{\theta}_{n}^{(ik)}-\theta\big{)}\xrightarrow[n\rightarrow\infty]{d}\mathsf{N}\Big{(}0,\tfrac{\sigma_{\tau}^{2}}{[\tau^{\prime}(\theta)]^{2}}\Big{)},\quad\text{where}\quad\sigma_{\tau}^{2}=\mathsf{var}\Big{\{}8\,C(U_{11},U_{21};\theta)-4\,U_{11}-4\,U_{21}\Big{\}},

\big{(}U_{11},U_{21}\big{)}=\big{(}F_{1\varepsilon}(\varepsilon_{11}),F_{2\varepsilon}(\varepsilon_{21})\big{)}.

\big{(}U_{11},U_{21}\big{)}=\big{(}F_{1\varepsilon}(\varepsilon_{11}),F_{2\varepsilon}(\varepsilon_{21})\big{)}.

\boldsymbol{\widehat{\theta}}_{n}^{(md)}=\operatorname*{arg\,min}_{\mathbf{t}\in\Theta}\iint_{[0,1]^{2}}\big{(}\widetilde{C}_{n}(u_{1},u_{2})-C(u_{1},u_{2};\mathbf{t})\big{)}^{2}\,du_{1}\,du_{2}

\boldsymbol{\widehat{\theta}}_{n}^{(md)}=\operatorname*{arg\,min}_{\mathbf{t}\in\Theta}\iint_{[0,1]^{2}}\big{(}\widetilde{C}_{n}(u_{1},u_{2})-C(u_{1},u_{2};\mathbf{t})\big{)}^{2}\,du_{1}\,du_{2}

\sqrt{n}\Big{(}\boldsymbol{\widehat{\theta}}_{n}^{(md)}-\boldsymbol{\theta}\Big{)}\xrightarrow[n\rightarrow\infty]{d}\mathsf{N}\Big{(}\mathbf{0}_{p},\boldsymbol{\Sigma}^{(md)}\Big{)},

\sqrt{n}\Big{(}\boldsymbol{\widehat{\theta}}_{n}^{(md)}-\boldsymbol{\theta}\Big{)}\xrightarrow[n\rightarrow\infty]{d}\mathsf{N}\Big{(}\mathbf{0}_{p},\boldsymbol{\Sigma}^{(md)}\Big{)},

Σ^{(m d)}

Σ^{(m d)}

\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\sum_{j=1}^{2}C^{(j)}(u_{1},u_{2};\boldsymbol{\theta})\,\mathbbm{1}\{U_{j1}\leq u_{j}\}\Big{]}\,du_{1}\,du_{2}\Bigg{\}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A copula approach for dependence modeling in multivariate nonparametric time series

Natalie Neumeyera, Marek Omelkab, Šárka Hudecováb

Abstract.

This paper is concerned with modeling the dependence structure of two (or more) time-series in the presence of a (possibly multivariate) covariate which may include past values of the time series. We assume that the covariate influences only the conditional mean and the conditional variance of each of the time series but the distribution of the standardized innovations is not influenced by the covariate and is stable in time. The joint distribution of the time series is then determined by the conditional means, the conditional variances and the marginal distributions of the innovations, which we estimate nonparametrically, and the copula of the innovations, which represents the dependency structure. We consider a nonparametric as well as a semiparametric estimator based on the estimated residuals. We show that under suitable assumptions these copula estimators are asymptotically equivalent to estimators that would be based on the unobserved innovations. The theoretical results are illustrated by simulations and a real data example.

a Department of Mathematics, University of Hamburg, Bundesstrasse 55, 20146 Hamburg, Germany

b Department of Probability and Statistics, Faculty of Mathematics and Physics, Charles University, Sokolovská 83, 186 75 Praha 8, Czech Republic

Keywords and phrases: Asymptotic representation; CHARN model; empirical copula process; goodness-of-fit testing; nonparametric AR-ARCH model; nonparametric SCOMDY model; weak convergence.

NOTICE: This is the version of the manuscript accepted to Journal of Multivariate Analysis. DOI: 10.1016/j.jmva.2018.11.016

1. Introduction

Modeling the dependency of $k$ observed time series can be of utmost importance for applications, e. g. in risk management (for instance to model the dependence between several exchange rates). We will take the approach to model $k$ dependent nonparametric AR-ARCH time series

[TABLE]

where the covariate $\boldsymbol{X}_{i}$ may include past values of the process, $Y_{j\,i-1},Y_{j\,i-2},\dots$ ( $j=1,\dotsc,k$ ), or other exogenous variables. Further the innovations $(\varepsilon_{1i},\dotsc,\varepsilon_{ki})$ , $i\in\mathbb{Z}$ , are assumed to be independent and identically distributed random vectors and $(\varepsilon_{1i},\dotsc,\varepsilon_{ki})$ is independent of the past and present covariates $\boldsymbol{X}_{\ell}$ , $\ell\leq i$ for all $i$ . For identifiability we assume $\mathsf{E}\,\varepsilon_{ji}=0$ , $\mathsf{var}(\varepsilon_{ji})=1$ ( $j=1,\dotsc,k$ ), such that the functions $m_{j}$ and $\sigma_{j}$ represent the conditional mean and volatility function of the $j$ th time series. Such models are also called multivariate nonparametric CHARN (conditional heteroscedastic autoregressive nonlinear) models and have gained much attention over the last decades, see Fan and Yao, (2005) and Gao, (2007) for extensive overviews.

Note that due to the structure of the model and Sklar’s theorem (see e.g., Nelsen,, 2006), for $z_{j}=(y_{j}-m_{j}(\mathbf{x}))/\sigma_{j}(\mathbf{x})$ ( $j=1,\dotsc,k$ ) one has

[TABLE]

where $F_{j\varepsilon}$ ( $j=1,\dotsc,k$ ) denote the marginal distributions of the innovations and $C$ their copula. Thus the joint conditional distribution of the observations, given the covariate, is completely specified by the individual conditional mean and variance functions, the marginal distributions of the innovations, and their copula. The copula $C$ describes the dependence structure of the $k$ time series, conditional on the covariates, after removing influences of the conditional means and variances as well as marginal distributions.

We will model the conditional mean and variance function nonparametrically like Härdle et al., (1998), among others. Semiparametric estimation, e. g. with additive structure for $m_{j}$ and multiplicative structure for $\sigma_{j}^{2}$ as in Yang et al., (1999), can be considered as well and all presented results remain valid under appropriate changes for the estimators and assumptions. Further we will model the marginal distributions of the innovations nonparametrically, whereas we will take two different approaches to estimate the copula $C$ : nonparametrically and parametrically. As the innovations are not observable, both estimators will be based on estimated residuals. We will show that the asymptotic distribution is not affected by the necessary pre-estimation of the mean and variance functions. This remarkable result is intrinsic for copula estimation and it was already observed in (semi)parametric estimation of copula (see the references in the next paragraph). It is worth noting that on the other hand the asymptotic distribution of empirical distribution functions is typically influenced by pre-estimation of mean and variance functions. Moreover, comparison of the nonparametric and parametric copula estimator gives us the possibility to test goodness-of-fit of a parametric class of copulas.

Our approach extends the following parametric and semiparametric approaches in time series contexts. Chen and Fan, (2006) introduced SCOMDY (semiparametric copula-based multivariate dynamic) models, which are very similar to the model considered here. However, the conditional mean and variance functions are modeled parametrically, while the marginal distributions of innovations are estimated nonparametrically and a parametric copula model is applied to model the dependence. See also Kim et al., (2007) for similar methods for some parametric time series models including nonlinear GARCH models, Rémillard et al., (2012), Kim et al., (2008) and the review by Patton, (2012). Chan et al., (2009) give (next to the parametric estimation of a copula) even a goodness-of fit test for the innovation copula in the GARCH context. Further, in an i.i.d. setting Gijbels et al., (2015) show that in nonparametric location-scale models the asymptotic distribution of the empirical copula is not influenced by pre-estimation of the mean and variance function. This results was further generalized by Portier and Segers, (2018) to a completely nonparametric model for the marginals.

The remainder of the paper is organized as follows. In Section 2 we define the estimators and state some regularity assumptions. In Subsection 2.1 we show the weak convergence of the copula process, while in Subsection 2.2 we show asymptotic normality of a parameter estimator when considering a parametric class of copulas. Subsection 2.3 is devoted to goodness-of-fit testing. In Section 3 we present simulation results and in Section 4 a real data example. All proofs are given in the Appendix.

2. Main results

For the ease of presentation we will focus on the case of two time series, i. e. $k=2$ , but all results can be extended to general $k\geq 2$ in an obvious manner. Suppose we have observed for $i=1,\dotsc,n$ a section of the stationary stochastic process $\big{\{}Y_{1i},Y_{2i},\boldsymbol{X}_{i}\big{\}}_{i\in\mathbb{Z}}$ that satisfies

[TABLE]

where $\boldsymbol{X}_{i}=(X_{i1},\dotsc,X_{id})^{\mathsf{T}}$ is a $d$ -dimensional covariate and the innovations $\big{\{}(\varepsilon_{1i},\varepsilon_{2i})\big{\}}_{i\in\mathbb{Z}}$ are independent identically distributed random vectors. Further $(\varepsilon_{1i},\varepsilon_{2i})$ is independent of the past and present covariates $\boldsymbol{X}_{k}$ , $k\leq i,\,\forall i$ , and $\mathsf{E}\,\varepsilon_{1i}=\mathsf{E}\,\varepsilon_{2i}=0$ , $\mathsf{var}(\varepsilon_{1i})=\mathsf{var}(\varepsilon_{2i})=1$ . If the marginal distribution functions $F_{1\varepsilon}$ and $F_{2\varepsilon}$ of the innovations are continuous, then the copula function $C$ of the innovations is unique and can be expressed as

[TABLE]

As the innovations $(\varepsilon_{1i},\varepsilon_{2i})$ are unobserved, the inference about the copula function $C$ is based on the estimated residuals

[TABLE]

where $\widehat{m}_{j}$ and $\widehat{\sigma}_{j}$ are the estimates of the unknown functions $m_{j}$ and $\sigma_{j}$ . In what follows we will consider the local polynomial estimators of order $p$ ; see Fan and Gijbels, (1996) or Masry, (1996), among others. Here, for a given $\mathbf{x}=(x_{1},\dotsc,x_{d})^{\mathsf{T}}$ , $\widehat{m}_{j}(\mathbf{x})$ is defined as $\widehat{\beta}_{\mathbf{0}}$ , the component of $\widehat{\boldsymbol{\beta}}$ with multi-index $\mathbf{0}=(0,\dotsc,0)$ , where $\widehat{\boldsymbol{\beta}}$ is the solution to the minimization problem

[TABLE]

Here $\mathbb{I}=\mathbb{I}(d,p)$ denotes the set of multi-indices $\mathbf{i}=(i_{1},\dotsc,i_{d})$ with $i.=i_{1}+\dots+i_{d}\leq p$ and $\psi_{\mathbf{i},\mathbf{h}_{n}}(\mathbf{x})=\prod_{k=1}^{d}\big{(}\frac{x_{k}}{h_{n}^{(k)}}\big{)}^{i_{k}}\frac{1}{i_{k}!}$ . Further

[TABLE]

with $k$ being a kernel function and $\mathbf{h}_{n}=\big{(}h_{n}^{(1)},\dotsc,h_{n}^{(d)}\big{)}$ the smoothing parameter.

Further $\sigma_{j}^{2}(\mathbf{x})$ is estimated as

[TABLE]

where $\widehat{s}_{j}(\mathbf{x})$ is obtained in the same way as $\widehat{m}_{j}(\mathbf{x})$ but with $Y_{j\ell}$ replaced with $Y_{j\ell}^{2}$ .

For any function $f$ defined on $\mathbf{J}$ , interval in $\mathbb{R}^{d}$ , define for $\ell\in\mathbb{N}$ , $\delta\in(0,1]$ ,

[TABLE]

where $D^{\mathbf{i}}=\frac{\partial^{i.}}{\partial x_{1}^{i_{1}}\ldots\partial x_{d}^{i_{d}}},$ and $\|\cdot\|$ is the Euclidean norm on $\mathbb{R}^{d}$ . Denote by $C_{M}^{\ell+\delta}(\mathbf{J})$ the set of $\ell$ -times differentiable functions $f$ on $\mathbf{J}$ , such that $\|f\|_{\ell+\delta}\leq M$ . Denote by $\widetilde{C}_{2}^{\ell+\delta}(\mathbf{J})$ the subset of $C_{2}^{\ell+\delta}(\mathbf{J})$ of the functions that satisfy $\inf_{\mathbf{x}\in\mathbf{J}}f(\mathbf{x})\geq\frac{1}{2}$ .

In what follows we are going to prove that under appropriate regularity assumptions using the estimated residuals (3) instead of the (true) unobserved innovations $\varepsilon_{ji}$ affects neither the asymptotic distribution of the empirical copula estimator nor the parametric estimator of a copula.

2.1. Empirical copula estimation

Mimicking $\eqref{eq: explicit expression for copula}$ the copula function $C$ can be estimated nonparametrically as

[TABLE]

where

[TABLE]

is the estimate of the joint distribution function $F_{\varepsilon}(y_{1},y_{2})$ and

[TABLE]

the corresponding marginal empirical cumulative distribution functions. Here we make use of a weight function $w_{n}(\mathbf{x})=\mathbbm{1}\{\mathbf{x}\in\mathbf{J}_{n}\}$ and put $w_{ni}=w_{n}(\boldsymbol{X}_{i})$ as well as $W_{n}=\sum_{j=1}^{n}w_{nj}$ . For some real positive sequence $c_{n}\to\infty$ we set $\mathbf{J}_{n}=[-c_{n},c_{n}]^{d}$ .

Now let $C_{n}^{(or)}$ be the ‘oracle’ estimator based on the unobserved innovations, i.e.

[TABLE]

where $\widehat{F}_{\varepsilon}(z_{1},z_{2})=\frac{1}{n}\sum_{i=1}^{n}\mathbbm{1}\big{\{}\varepsilon_{1i}\leq z_{1},\varepsilon_{2i}\leq z_{2}\big{\}}$ is the estimator of $F_{\varepsilon}(z_{1},z_{2})$ based on the unobserved innovations and $\widehat{F}_{j\varepsilon}$ $(j=1,2)$ the corresponding marginal empirical cumulative distribution functions.

Regularity assumptions

$\mathbf{(\boldsymbol{\beta})}$

The process $(\boldsymbol{X}_{i},Y_{1i},Y_{2i})_{i\in\mathbb{Z}}$ is strictly stationary and absolutely regular ( $\beta$ -mixing) with the mixing coefficient $\beta_{i}$ that satisfies $\beta_{i}=O(i^{-b})$ with $b>d+3$ .

$\mathbf{(F_{\boldsymbol{\varepsilon}})}$

The second-order partial derivatives $F_{\varepsilon}^{(1,1)}$ , $F_{\varepsilon}^{(1,2)}$ and $F_{\varepsilon}^{(2,2)}$ of the joint cumulative distribution function $F_{\varepsilon}(y_{1},y_{2})=\mathsf{P}(\varepsilon_{1}\leq y_{1},\varepsilon_{2}\leq y_{2})$ , with $F_{\varepsilon}^{(j,k)}(y_{1},y_{2})=\frac{\partial^{2}F_{\varepsilon}(y_{1},y_{2})}{\partial y_{j}\partial y_{k}}$ , satisfy

[TABLE]

Further the innovation density $f_{j\varepsilon}$ $(j=1,2)$ satisfies

[TABLE]

$\mathbf{(F_{\boldsymbol{X}})}$

The observations $\boldsymbol{X}_{i}$ ( $i\in\mathbb{Z}$ ) have density $f_{\boldsymbol{X}}$ that is bounded and differentiable with bounded uniformly continuous first order partial derivatives. Suppose that the sequence $c_{n}$ which is of order $O\big{(}(\log n)^{1/d}\big{)}$ is chosen in such a way that $\inf_{\mathbf{x}\in\mathbf{J}_{n}}f_{\boldsymbol{X}}(\mathbf{x})$ converges to zero not faster than some negative power of $\log n$ .

$\mathbf{(M)}$

For some $s>\frac{2b-2-d}{b-3-d}$ with $b$ from assumption ( $\boldsymbol{\beta}$ ), for $j=1,2$ , $\mathsf{E}\,|\varepsilon_{j0}|^{2s}<\infty$ , the functions $\sigma_{j}^{2s}f_{\boldsymbol{X}}$ and $|m_{j}\sigma_{j}|^{s}f_{\boldsymbol{X}}$ are bounded and there are some $i^{*}\in\mathbb{N}$ , $B>0$ such that for all $i\geq i^{*}$ ,

[TABLE]

where $f_{\boldsymbol{X}_{0},\boldsymbol{X}_{i}}$ denotes the joint density of $(\boldsymbol{X}_{0},\boldsymbol{X}_{i})$ and is bounded (for $i\geq i^{*}$ ).

$\mathbf{(m\boldsymbol{\sigma})}$

Let, for $j=1,2$ and for each $n\in\mathbb{N}$ , $m_{j}$ and $\sigma_{j}$ be elements of $C_{M_{n}}^{p+1}(\mathbf{J}_{n})$ for some sequence $M_{n}$ that is either bounded or diverges to infinity not faster than some power of $\log n$ . Further, assume $\mathsf{E}[\sigma_{j}^{4}(\boldsymbol{X}_{1})]<\infty$ and that $\min_{j=1,2}\inf_{\mathbf{x}\in\mathbf{J}_{n}}\sigma_{j}(\mathbf{x})$ is either bounded away from zero or converges to zero not faster than a negative power of $\log n$ .

$\mathbf{(Bw)}$

There exists a sequence $h_{n}$ such that $\tfrac{h_{n}^{(k)}}{h_{n}}\to a_{k}$ , where $a_{k}\in(0,\infty)$ , $k=1,\dotsc,d$ . Further, there exists some $\delta>\tfrac{d}{b-1}$ such that

[TABLE]

for all $D>0$ .

$\mathbf{(k)}$

$k:\mathbb{R}\to\mathbb{R}$ is a symmetric ( $d+2$ )-times continuously differentiable probability density function supported on $[-1,1]$ .

*Remark 1**.*

Using $F_{\varepsilon}(y_{1},y_{2})=C\big{(}F_{1\varepsilon}(y_{1}),F_{2\varepsilon}(y_{2})\big{)}$ assumption $\mathbf{(F_{\boldsymbol{\varepsilon}})}$ requires that

[TABLE]

where $C^{(j)}(u_{1},u_{2})=\frac{\partial C(u_{1},u_{2})}{\partial u_{j}}$ and $C^{(j,k)}(u_{1},u_{2})=\frac{\partial^{2}C(u_{1},u_{2})}{\partial u_{j}\partial u_{k}}$ stand for the first and second order partial derivatives of the copula function.

Thus provided that for some $\eta>0$

[TABLE]

then we need that the functions $f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}(1+\big{|}F_{j\varepsilon}^{-1}(u)\big{|})$ are of order $O(u^{\eta}(1-u)^{\eta})$ and the functions $f^{\prime}_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u_{j})\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u_{j})|\big{)}^{2}$ are bounded.

*Remark 2**.*

Parts of our assumptions are reproduced from Hansen, (2008) because we apply his results about uniform rates of convergence for kernel estimators several times in our proofs. Note that in his Theorem 2 we set $q=\infty$ to simplify the assumptions. Further note that if beta mixing coefficients are diminishing exponentially fast then it is sufficient to assume $s>2$ in $\mathbf{(M)}$ .

*Remark 3**.*

Note that the bandwidth conditions (8) can be fulfilled iff $2p+2>3d+2\delta$ , i.e. in view of assumption $\mathbf{(Bw)}$ iff $2p+2>3d+\frac{2d}{b-1}$ . Thus if $b>2d+1$ , then for $d=1$ it is sufficient to take $p=1$ and for $d=2$ one can take $p=3$ . In general with increasing dimension $d$ higher smoothness of the unknown functions has to be assumed and higher order local polynomial estimators have to be used. This phenomenon is well known in the context of nonparametric inference.

So in general one can choose the bandwidth as $h_{n}\sim n^{-\frac{1}{a}}$ , where $a\in(3d+\tfrac{2d}{b-1},2p+2)$ . The problem is that if one wants to take $p$ as small as possible, the range of possible values of $a$ is rather short which makes the choice of $a$ rather delicate. To make the choice of $a$ more flexible in practice one can for instance assume that $b>10d+1$ which (among others) includes models for beta mixing coefficients diminishing exponentially fast. Now for $d=1$ and $p=1$ one can take $a$ in the interval $(3.1,4)$ . See also the bandwidth choice in our simulation study in Section 3.

*Remark 4**.*

The choice of $c_{n}$ is a delicate problem in practice. As far as we know even in analogous settings (see e.g. Müller et al.,, 2009; Dette et al.,, 2009; Koul and Zhu,, 2015, and the references therein) this problem has not been touched yet. Note that the weight function $w_{n}(\mathbf{x})$ is chosen in the simplest possible form in order to simplify the presentation of the proof. On the other hand in practice it is of interest to use more general forms of $\mathbf{J}_{n}$ . Further as the density $f_{\boldsymbol{X}}$ is unknown, data-driven procedures to choose $\mathbf{J}_{n}$ are of interest. In the simulation study in Section 3 we suggest a data-driven procedure for the choice of the weighting function in the case $d=1$ . Nevertheless the data driven choice of $\mathbf{J}_{n}$ (in particular for general $d$ ) and its theoretical justification calls for further research.

Theorem 1.

Suppose that assumptions $\mathbf{(\boldsymbol{\beta})}$ , $\mathbf{(F_{\boldsymbol{\varepsilon}})}$ , $\mathbf{(F_{\boldsymbol{X}})}$ , $\mathbf{(Bw)}$ , $\mathbf{(M)}$ , $\mathbf{(k)}$ , $\mathbf{(J_{n})}$ and $\mathbf{(m\boldsymbol{\sigma})}$ are satisfied. Then

[TABLE]

Note that Theorem 1 together with the weak convergence of $\sqrt{n}\,\big{[}C_{n}^{(or)}-C\big{]}$ (see e.g., Proposition 3.1 of Segers,, 2012) implies that that process $\widetilde{\mathbb{C}}_{n}=\sqrt{n}\,\big{[}\widetilde{C}_{n}-C\big{]}$ weakly converges in the space of bounded functions $\ell^{\infty}([0,1]^{2})$ to a centred Gaussian process $G_{C}$ , which can be written as

[TABLE]

where $B_{C}$ is a Brownian bridge on $[0,1]^{2}$ with covariance function

[TABLE]

Nevertheless when one uses this result in applications for statistical inference we recommend to replace the sample size $n$ with $W_{n}=\sum_{i=1}^{n}w_{ni}$ in the formulas. The thing is that the copula is estimated in fact only from $W_{n}$ observations and this should be reflected in order to improve the finite sample performance of asymptotic inference procedures.

2.2. Semiparametric copula estimation

The copula $C$ describes the dependency between the two time series of interest, given the covariate. For applications modeling this dependency structure parametrically is advantageous because a parametric model often gives easier access to interpretations. Goodness-of-fit testing will be considered in the next section.

Suppose that the joint distribution of $(\varepsilon_{1i},\varepsilon_{2i})$ is given by the copula function $C(u_{1},u_{2};\boldsymbol{\theta})$ , where $\boldsymbol{\theta}=(\theta_{1},\dotsc,\theta_{p})^{\mathsf{T}}$ is an unknown parameter that belongs to a parametric space $\Theta\subset\mathbb{R}^{p}$ . In copula settings we are often interested in semiparametric estimation of the parameter $\boldsymbol{\theta}$ , i.e. estimation of $\boldsymbol{\theta}$ without making any parametric assumption on the marginal distributions $F_{1\varepsilon}$ and $F_{2\varepsilon}$ . The methods of semiparametric estimation for i.i.d. settings are summarized in Tsukahara, (2005). The question of interest is what happens if we use the estimated residuals (3) instead of the unobserved innovations $\varepsilon_{ji}$ . Generally speaking, thanks to Theorem 1 the answer is that using $\widehat{\varepsilon}_{ji}$ instead of $\varepsilon_{ji}$ does not change the asymptotic distribution provided that the parameter of interest can be written as a Hadamard differentiable functional of a copula.

2.2.1. Method-of-Moments using rank correlation

This method is in a general way described for instance in McNeil et al., (2005, Section 5.5.1). To illustrate the application of Theorem 1 for this method consider that the parameter $\theta$ is one-dimensional. Then the method of the inversion of Kendall’s tau is a very popular method of estimating the unknown parameter. For this method the estimator of $\theta$ is given by

[TABLE]

where

[TABLE]

is the theoretical Kendall’s tau and $\widehat{\tau}_{n}$ is an estimate of Kendall’s tau. In our settings the Kendall’s tau would be computed from the estimated residuals $(\widehat{\varepsilon}_{1i},\widehat{\varepsilon}_{2i})$ for which $w_{ni}>0$ . By Theorem 1 and Hadamard differentiability of Kendall’s tau proved in Veraverbeke et al., (2011, Lemma 1), the estimators of Kendall’s tau based on $\widehat{\varepsilon}_{ji}$ or on $\varepsilon_{ji}$ are asymptotically equivalent. Thus provided that $\tau^{\prime}(\theta)\neq 0$ one gets that

[TABLE]

and

[TABLE]

Analogously one can show that working with residuals has asymptotically negligible effects also for the method of moments introduced in Brahimi and Necir, (2012).

2.2.2. Minimum distance estimation

Here one can follow for instance Tsukahara, (2005, Section 3.2). Note that thanks to Theorem 1 the proof of Theorem 3 of Tsukahara, (2005) does not change when $C_{n}^{(or)}$ is replaced with $\widetilde{C}_{n}$ . Thus provided assumptions (B.1)-(B.5) of Tsukahara, (2005) are satisfied with $\boldsymbol{\delta}(u_{1},u_{2};\boldsymbol{\theta})=\frac{\partial C(u_{1},u_{2};\boldsymbol{\theta})}{\partial\boldsymbol{\theta}}$ , then the estimator defined as

[TABLE]

is asymptotically normal and satisfies

[TABLE]

where

[TABLE]

with

[TABLE]

2.2.3. M-estimator, rank approximate Z-estimators

To define a general $M$ -estimator let us introduce

[TABLE]

that can be viewed as estimates of the unobserved $(U_{1i},U_{2i})$ . Note that the multiplier $\tfrac{W_{n}}{W_{n}+1}$ is introduced in order to have both of the coordinates of the vector $\big{(}\widetilde{U}_{1i},\widetilde{U}_{2i}\big{)}$ bounded away from zero and one. The $M$ -estimator of the parameter $\boldsymbol{\theta}$ is now defined as

[TABLE]

where $\rho(u_{1},u_{2};\boldsymbol{\theta})$ is a given loss function. This class of estimators includes among others the pseudo-maximum likelihood estimators ( $\boldsymbol{\widehat{\theta}}_{n}^{(pl)}$ ), for which $\rho(u_{1},u_{2};\boldsymbol{\theta})=-\log c(u_{1},u_{2};\boldsymbol{\theta})$ , with $c(\cdot)$ being the copula density function.

Note that the estimator $\boldsymbol{\widehat{\theta}}_{n}$ is usually searched for as a solution to the estimating equations

[TABLE]

where $\boldsymbol{\phi}(u_{1},u_{2};\boldsymbol{\theta})=\partial\rho(u_{1},u_{2};\boldsymbol{\theta})/\partial\boldsymbol{\theta}$ . In Tsukahara, (2005) the estimator defined as the solution of (11) is called a rank approximate $Z$ -estimator.

In what follows we give general assumptions under which there exists a consistent root ( $\boldsymbol{\widehat{\theta}}_{n}$ ) of the estimating equations (11) that is asymptotically equivalent to the consistent root ( $\boldsymbol{\widehat{\theta}}_{n}^{(or)}$ ) of the ‘oracle’ estimating equations given by

[TABLE]

where

[TABLE]

are the standard pseudo-observations calculated from the unobserved innovations and their marginal empirical distribution functions $\widehat{F}_{j\varepsilon}(y)$ .

Unfortunately, these general assumptions exclude some useful models (e.g. pseudo-maximum likelihood estimator in the Clayton family of copulas) for which the function $\boldsymbol{\phi}(u_{1},u_{2};\boldsymbol{\theta})$ viewed as a function of $(u_{1},u_{2})$ is unbounded. The reason is that for empirical distribution functions calculated from estimated residuals $\widehat{\varepsilon}_{ji}$ we lack some of the sophisticated results that are available for empirical distribution functions calculated from (true) innovations $\varepsilon_{ji}$ . For such copula families one can use for instance the Method-of-Moments using rank correlation (see Section 2.2.1) to stay on the safe side. Nevertheless the simulation study in Section 3 suggests that the pseudo-maximum likelihood estimation can be used also for the Clayton copula (and probably also for other families of copulas with non-zero tail dependence) provided that the dependence is not very strong.

Regularity assumptions

In what follows let $\boldsymbol{\theta}$ stand for the true value of the parameter and $V(\boldsymbol{\theta})$ for an open neighbourhood of $\boldsymbol{\theta}$ .

$\mathbf{(Id)}$

$\boldsymbol{\theta}$ is a unique minimizer of the function $r(\mathbf{t})=\mathsf{E}\,\rho(U_{1i},U_{2i};\mathbf{t})$ and $\boldsymbol{\theta}$ is an inner point of $\Theta$ .

$\mathbf{(\boldsymbol{\phi})}$

There exists $V(\boldsymbol{\theta})$ such that for each $l_{1},l_{2}\in\{1,\dotsc,p\}$ the functions $\phi_{l_{1}}(u_{1},u_{2};\mathbf{t})=\tfrac{\partial\rho(u_{1},u_{2};\mathbf{t})}{\partial t_{l_{1}}}$ and $\phi_{l_{1},l_{2}}(u_{1},u_{2};\mathbf{t})=\tfrac{\partial\rho(u_{1},u_{2};\mathbf{t})}{\partial t_{l_{1}}\partial t_{l_{2}}}$ are uniformly continuous in $(u_{1},u_{2})$ uniformly in $\mathbf{t}\in V(\boldsymbol{\theta})$ and of uniformly bounded Hardy-Kraus variation (see e.g., Berghaus et al.,, 2017).

$\mathbf{(\boldsymbol{\phi}^{(j)})}$

There exists $V(\boldsymbol{\theta})$ and a function $h(u_{1},u_{2})$ such that for each $\mathbf{t}\in V(\boldsymbol{\theta})$

[TABLE]

and $\mathsf{E}\,h(U_{11},U_{21})<\infty$ .

$\mathbf{(\boldsymbol{\Gamma})}$

Each element of the (matrix) function $\boldsymbol{\Gamma}(\mathbf{t})=\mathsf{E}\,\frac{\partial\boldsymbol{\phi}(U_{1},U_{2};\mathbf{t})}{\partial\mathbf{t}^{\mathsf{T}}}$ is a continuous function on $V(\boldsymbol{\theta})$ and the matrix $\boldsymbol{\Gamma}=\boldsymbol{\Gamma}(\boldsymbol{\theta})$ is positively definite.

Theorem 2.

Suppose that the assumptions of Theorem 1 are satisfied and that also $\mathbf{(Id)}$ , $\mathbf{(\boldsymbol{\phi})}$ , $\mathbf{(\boldsymbol{\phi}^{(j)})}$ , and $\mathbf{(\boldsymbol{\Gamma})}$ hold. Then with probability going to one there exists a consistent root $\boldsymbol{\widehat{\theta}}_{n}$ of the estimating equations (11), which satisfies

[TABLE]

where

[TABLE]

The proof of Theorem 2 is given in Appendix B. Note that the asymptotic distribution of the estimator $\boldsymbol{\widehat{\theta}}_{n}$ coincides with the distribution given in Section 4 of Genest et al., (1995) that corresponds to the consistent root $\boldsymbol{\widehat{\theta}}_{n}^{(or)}$ of the estimating equations (12). Thus using the residuals instead of the true innovations has asymptotically negligible effect on the (first-order) asymptotic properties. In fact, it can be even shown that both $\boldsymbol{\widehat{\theta}}_{n}$ and $\boldsymbol{\widehat{\theta}}_{n}^{(or)}$ have the same asymptotic representations and thus

[TABLE]

2.3. Goodness-of-fit testing

When modeling multivariate data using copulas parametrically one needs to choose a suitable family of copulas. When choosing the copula family tests of goodness-of-fit are often a useful tool. Thus we are interested in testing $H_{0}:\,C\in\mathcal{C}_{0}$ , where $\mathcal{C}_{0}=\{C_{\boldsymbol{\theta}},\boldsymbol{\theta}\in\Theta\}$ is a given parametric family of copulas.

Many testing methods have been proposed (see e.g. Genest et al.,, 2009; Kojadinovic and Holmes,, 2009, and the references therein). The most standard ones are based on the comparison of nonparametric and parametric estimators of a copula. For instance the Cramér-von Mises statistic is given by

[TABLE]

where $\boldsymbol{\widehat{\theta}}_{n}$ is an estimate of the unknown parameter $\boldsymbol{\theta}$ . As the asymptotic distributions of $\widetilde{C}_{n}(u_{1},u_{2})$ and $\boldsymbol{\widehat{\theta}}_{n}$ are the same as the asymptotic distribution of $\widetilde{C}_{n}^{(or)}(u_{1},u_{2})$ and $\boldsymbol{\widehat{\theta}}_{n}^{(or)}$ we suggest that the significance of the test statistic can be assessed in the same way as in i.i.d. settings. Thus one can use for instance the parametric bootstrap by simply generating independent and identically distributed observations from the copula function $C(u_{1},u_{2};\boldsymbol{\widehat{\theta}}_{n})$ . The test statistic is then simply recalculated from this observations in the same way as if we directly observed the innovations. The only difference is that instead of generating $n$ observations we recommend to generate only $W_{n}$ observations.

Similar remarks hold when testing other hypotheses about the copula such as symmetry, for instance. Note that testing $H_{0}:C(u_{1},u_{2})\equiv u_{1}u_{2}$ provides a test for conditional independence of the two time series, given the covariate.

3. Simulation study

A small Monte Carlo study was conducted in order to compare the semiparametric estimators based on the residuals with the ‘oracle’ estimators based on (unobserved) innovations. The inversion of Kendall’s tau (IK) method and the maximum pseudo-likelihood (MPL) method were considered for the following five copula families: Clayton, Frank, Gumbel, normal, and Student with 4 degrees of freedom. The values of the parameters are chosen so that they correspond to the Kendall’s tau $\tau=0.25$ , $0.50$ and $0.75$ . The data were simulated from the following four models:

[TABLE]

where the innovations $\varepsilon_{ji}$ , $j=1,2$ , follow marginally the standard normal distribution, and $X_{i}$ is an exogenous variable following the AR model $X_{i}=0.6X_{i-1}+\xi_{i}$ with $\xi_{i}$ being i.i.d. from a standard normal distribution. The simulations were conducted also for innovations $\varepsilon_{ji}$ , $j=1,2$ with Student marginals with 5 degrees of freedom, but the corresponding results are very similar. For brevity of the paper we do not present them here.

The nonparametric estimates $\widehat{m}_{j}$ and $\widehat{\sigma}_{j}$ are constructed as local polynomial estimators of order $p=1$ with $K$ being the triweight kernel. The bandwidth $h_{n}$ is chosen for each estimation separately by the cross-validation method from the interval $(D,H)$ , where $D=\widehat{\sigma}_{Z}/n^{1/(3+\varepsilon)}$ and $H=\widehat{\sigma}_{Z}\log^{2}(n)/n^{1/(4-\varepsilon)}$ for $\varepsilon=0.1$ (cf. Remark 3) and $\widehat{\sigma}_{Z}$ is an estimate of the standard deviation of the explanatory variable $Z$ (being $X_{i}$ or $Y_{i-1}$ , depending on the model) given by $\widehat{\sigma}_{Z}=\min\{S_{Z},\mathrm{IQR}_{Z}/1.34\}$ , where $S_{Z}$ stands for the sample standard deviation and $\mathrm{IQR}_{Z}$ is the interquartile range.

The weights are given by $w_{n}(z)=\mathbbm{1}\{z\in[c_{n}^{L},c_{n}^{U}]\}$ , where $[c_{n}^{L},c_{n}^{U}]$ is the largest possible interval such that $\inf_{z\in[c_{n}^{L},c_{n}^{U}]}\widehat{f}_{Z}(z)\geq(\widehat{\sigma}_{Z}\log^{2}(n))^{-1}$ , where $\widehat{f}_{Z}$ is the kernel density estimator of the marginal density of $Z$ (with triweight kernel and the bandwidth chosen by the standard normal reference rule, see e.g. Fan and Yao,, 2005, p. 201).

For each setting, we compute the estimate of the copula parameter $\theta$ from the true (but unobserved) innovations using the inversion of Kendall’s tau method ( $\widehat{\theta}_{n}^{(ik,or)}$ ) and the maximum pseudo-likelihood method ( $\widehat{\theta}_{n}^{(pl,or)}$ ). These oracle estimators are compared with their counterparts computed from the residuals ( $\widehat{\theta}_{n}^{(ik)}$ ) and ( $\widehat{\theta}_{n}^{(pl)}$ ). To have more comparable results for different copula families the estimates of the parameters are done on the Kendall’s tau scale. That is we are in fact comparing nonparametric estimates of Kendall’s tau with parametric estimates, where the parameter is estimated with the help of maximum pseudo-likelihood method. The performance of the estimators is measured by the bias, standard deviation (SD), and the root mean square error (RMSE), which are estimated from the $1\,000$ random samples for chosen sample sizes $n=200$ , $500$ and $1000$ . Since the obtained quantities are of order $10^{-2}$ and smaller, we report $100$ multiples of bias, SD and RMSE in Tables 1,2,3,4 and 5. As $\widehat{\theta}_{n}^{(ik)}$ and $\widehat{\theta}_{n}^{(pl)}$ are natural competitors, the bigger of the two corresponding performance measures (bias, SD, RMSE) is stressed by the bold font.

In agreement with the results of Genest et al., (1995) and Tsukahara, (2005) the results for the (oracle) estimates based on (unobserved) innovations are in favour of MPL method. This continues to hold also when working with estimated residuals provided that the dependence is not very strong (i.e. $\tau=0.25$ or $\tau=0.50$ ). But if the dependence is strong (i.e. $\tau=0.75$ ) then one should consider using the IK method. This seems to be true in particular for the Clayton copula and to some extent also for the Frank copula and the Gumbel copula. A closer inspection of the results reveals that while the standard deviation of MPL method is almost always slightly smaller than the standard deviation of the IK method, the bias can be substantially larger. On the other hand the results suggest that for the normal and the Student copula one can stay with MPL method even in case of a strong dependence.

Finally note that for large sample sizes the performance of the estimates based on residuals is usually almost as good as of the oracle estimates based on (unobserved) innovations. But there is still some price to pay even for the sample size $n=1000$ and this price relatively increases with the level of dependence. The question for possible further research is how to explain the bad performance of PML method based on residuals for the Clayton copula with a strong dependence.

4. Application

To illustrate the proposed methods let us consider daily log returns of USD/CZK (US Dollar/Czech Koruna) and GBP/CZK (British Pound/Czech Koruna) exchange rates from 4th January 2010 to 31st December 2012. Note that we take only data till the end of 2012 (total of 758 observations for each series), because in November 2013 the Czech National Bank started its intervention aimed at CZK/EUR exchange rate.

Daily foreign exchange rates have been successfully modelled using the nonparametric autoregression, e.g., in Härdle et al., (1998) and Yang et al., (1999). Here, we apply a simple model of two separate nonparametric autoregressions of order 1 and search for a feasible copula for the innovations. The conditional means and variances are modelled using local polynomials with degree $p=1$ . The weights and the smoothing parameters are chosen as in Section 3. The fitted conditional means and standard deviations are plotted together with the data in Figure 1. It is visible that the conditional mean functions are rather flat and range around zero.

We use the goodness-of-fit test proposed in Section 2.3 in order to decide which copula should be used for modeling the innovations from the two autoregressions. The copula parameter is estimated using the inversion of Kendall’s tau method. The significance of the test statistics is assessed with the help of the bootstrap test based on $B=999$ bootstrap samples. We test Clayton, Frank, Gumbel, normal and Student copula with 4 degrees of freedom respectively and obtain $p$ -values $0.000$ , $0.000$ , $0.001$ , $0.055$ , $0.305$ . Hence, we conclude that the Student copula seems to be the best choice for the innovations. The normal copula is also not rejected on the 5% level, but the corresponding $p$ -value is rather borderline, so the Student copula seems to provide a better fit. The maximum pseudo-likelihood method estimates $5.156$ degrees of freedom and parameter $\rho=0.778$ . Figure 2 shows plot of pseudo-observations $(\widetilde{U}_{1i},\widetilde{U}_{2i})$ given by (10), together with contours of the fitted Student copula.

Acknowledgement

The authors thank the editor, an associate editor and two referees for their very valuable comments that led to a considerable improvement of the contribution. The second and the third authors gratefully acknowledge support from the grant GACR 15-04774Y.

Appendix A - Proof of Theorem 1

Recall the definition $W_{n}=\sum_{j=1}^{n}w_{nj}$ . Introduce

[TABLE]

and note that

[TABLE]

where $\widehat{G}_{1n}$ and $\widehat{G}_{2n}$ denote the marginals of $\widehat{G}_{n}$ . Further $\widehat{G}_{n}$ is a distribution function on $[0,1]^{2}$ with the marginals cdfs satisfying $\widehat{G}_{1n}(0)=\widehat{G}_{2n}(0)=0$ . Thus one can make use of the Hadamard differentiability of the ‘copula mapping’ $\Phi:G\mapsto G(G_{1}^{-1},G_{2}^{-1})$ proved in Theorem 2.4 of Bücher and Volgushev, (2013) provided we show that the process

[TABLE]

converges in distribution in the space $\ell^{\infty}([0,1]^{2})$ to a process $\mathbb{G}$ with continuous trajectories such that $\mathbb{G}(u,0)=\mathbb{G}(0,u)=\mathbb{G}(1,1)=0$ for each $u\in[0,1]$ .

A1: Decomposition and weak convergence of \texorpdfstring $\widehat{\mathbb{G}}_{n}$ Gn

Denote

[TABLE]

Now one can decompose the process $\widehat{\mathbb{G}}_{n}$ as

[TABLE]

where $\widetilde{\mathbb{G}}_{n}$ , $\widetilde{\mathbb{G}}_{n}^{(or)}$ and $\mathbb{G}_{n}^{(or)}$ stand for the first, second and third term respectively on the right-hand side of the first equation in (A2).

In Section A2 it will be shown that the first term on the right-hand side of (A2) satisfies uniformly in $(u_{1},u_{2})\in[0,1]^{2}$ ,

[TABLE]

where (in agreement with the last two conditions in $\mathbf{(F_{\boldsymbol{\varepsilon}})}$ ) for $u_{1}\in\{0,1\}$ the first term on the right-hand side of (A3) is defined as zero and analogously for $u_{2}\in\{0,1\}$ .

In Section A3, we will show the asymptotic negligibility of the second term on the right-hand side of (A2), i.e.

[TABLE]

Now combining (A2) with (A3) and (A4) yields that uniformly in $(u_{1},u_{2})$

[TABLE]

where

[TABLE]

The asymptotic representation (A5) together with standard techniques yields the weak convergence of the process $\widehat{\mathbb{G}}_{n}$ .

Now thanks to Hadamard differentiability of the copula functional and Theorem 3.9.4 of van der Vaart and Wellner, (1996),

[TABLE]

Note that for each $(u_{1},u_{2})\in[0,1]^{2}$

[TABLE]

Further combining (A7) with (A5), (A6) and (A8) gives

[TABLE]

Now the right-hand side of (A9) coincides with the asymptotic representation of the ‘oracle’ copula process $\sqrt{n}\big{[}C_{n}^{(or)}-C\big{]}$ , which implies the statement of Theorem 1.

A2: Showing \texorpdfstring(A3)(A3)

Let us introduce the process

[TABLE]

that is indexed be the following set of functions

[TABLE]

where

[TABLE]

and for $\delta$ from assumption $\mathbf{(Bw)}$ and some $\nu$ large enough such that

[TABLE]

Denote the centred process as

[TABLE]

and note that $f$ may be formally identified by $(c,z_{1},z_{2},a_{1},b_{1},a_{2},b_{2})$ . We will use the notation $f\hat{=}(c,z_{1},z_{2},a_{1},b_{1},a_{2},b_{2})$ . Further in agreement with the notation used in van der Vaart and Wellner, (2007) by $\bar{Z}_{n}(f_{n})$ for random $f_{n}$ we understand the value of the mapping $f\mapsto\bar{Z}_{n}(f)$ evaluated at $f_{n}$ .

Consider the semi-norm given by

[TABLE]

where

[TABLE]

From assumption $\mathbf{(\boldsymbol{\beta})}$ one obtains that $\beta^{-1}(u)\leq cu^{-1/b}$ for some constant $c$ . Further denote

[TABLE]

As $\mathcal{F}$ consists of indicator functions for $f,g\in\mathcal{F}$ one has $Q_{f-g}(u)=\mathbbm{1}\{0<u<P|f-g|\}$ . Thus one obtains for $\epsilon<1$

[TABLE]

Starting with brackets of $\|\cdot\|_{2}$ -length $\epsilon^{2b/(b-1)}$ of the function classes $\mathcal{G}$ , $\widetilde{\mathcal{G}}$ and $\{\mathbf{x}\mapsto\mathbbm{1}\{\mathbf{x}\in[-c,c]^{d}\}\mid c\in\mathbb{R}^{+}\}$ it is then easy to construct brackets for $\mathcal{F}$ with $\|\cdot\|_{2,\beta}$ -length $\epsilon$ (compare with the proof of Lemma 1 in Dette et al.,, 2009). Thus one obtains

[TABLE]

where the rate follows from Lemma 2 in Appendix C. Further one bracket is sufficient for $\epsilon\geq 1$ . Thus by (A15) and (A12),

[TABLE]

From Dedecker and Louhichi, (2002), Section 4.3, it follows that the centred process $\bar{Z}_{n}$ given by (A13) is asymptotically $\|.\|_{2,\beta}$ -equicontinuous.

To apply this result in order to prove (A3) note that

[TABLE]

and introduce the process

[TABLE]

with $\widehat{a}_{j},\widehat{b}_{j}$ , $j=1,2$ , from Lemma 1 and Remark 5 in Appendix C. Then one obtains by monotonicity arguments applying Lemma 1(i) that, on an event with probability converging to one

[TABLE]

for some deterministic positive sequence $\gamma_{n}=o(n^{-1/2})$ . Here,

[TABLE]

We only consider the upper bound, the lower one can be handled completely analogously. First note that $Z_{n}\big{(}f_{n}^{u}\big{)}-Z_{n}\big{(}g_{n})=\bar{Z}_{n}\big{(}f_{n}^{u}\big{)}-\bar{Z}_{n}\big{(}g_{n})+R_{n}$ , where with probability converging to one,

[TABLE]

where the last equality follows by a Taylor expansion, assumption $\mathbf{(F_{\boldsymbol{\varepsilon}})}$ and $\gamma_{n}=o(n^{-1/2})$ . Now introducing the notation ( $j=1,2$ )

[TABLE]

one can show as in (A14) that for a sufficiently large $M$

[TABLE]

and this can be bounded by $Mn^{-(1-1/b)/2}$ times the bound on the right hand side of (A16) and thus converges to zero in probability uniformly in $u_{1},u_{2}$ . Therefore there exists a deterministic sequence $\delta_{n}\searrow 0$ with $\mathsf{P}\big{(}\sup_{u_{1},u_{2}}\|f_{n}^{u}-g_{n}\|_{2,\beta}\leq\delta_{n}\big{)}\to 1$ as $n\to\infty$ . Further by Lemma 1 and Remark 5 of Appendix C one has $\mathsf{P}\big{(}f_{n}^{u}$ , $f_{n}^{l}$ , $g_{n}\in\mathcal{F}\big{)}\to 1$ as $n\to\infty$ . Now from $\|.\|_{2,\beta}$ -equicontinuity of $\bar{Z}_{n}$ one obtains for every $\epsilon>0$ that

[TABLE]

and thus $\big{|}\bar{Z}_{n}(f_{n}^{u})-\bar{Z}_{n}(g_{n})\big{|}=o_{P}(1)$ uniformly with respect to $u_{1},u_{2}$ . In combination with (A16), analogous considerations for the lower bound $Z_{n}\big{(}f_{n}^{\ell}\big{)}-Z_{n}\big{(}g_{n})$ and the fact that

[TABLE]

we obtain

[TABLE]

Further thanks to (A18) it is sufficient to show that the process $\check{\mathbb{G}}_{n}(u_{1},u_{2})$ has the asymptotic representation given by the right-hand side of (A3).

Thus the remaining proof of (A3) is divided into two parts. First we prove that

[TABLE]

and then we calculate $\mathsf{E}\,\!^{\ast}[\check{\mathbb{G}}_{n}(u_{1},u_{2})]$ . Here, with slight abuse of notation, $\mathsf{E}\,\!^{\ast}$ denotes expectation, considering the functions $\widehat{a}_{j}$ , $\widehat{b}_{j}$ as deterministic.

Showing (A19)

Note that we have

[TABLE]

where

[TABLE]

with [math] and $1$ standing for functions that are constantly equal to zero and one respectively. Similarly to before one can show that for a sufficiently large $M$

[TABLE]

using notation (A17).

Now note that with Lemma 1 (iii) in Appendix C we obtain $\|f_{n}-g_{n}\|_{2,\beta}=o_{P}(1)$ uniformly in $u_{1},u_{2}$ . Finally with the help of (A18), (A20) and the asymptotic $\|.\|_{2,\beta}$ -equicontinuity of the process $\bar{Z}_{n}$ one can conclude (A19).

Calculating $\mathsf{E}\,\!^{\ast}[\check{\mathbb{G}}_{n}(u_{1},u_{2})]$

To simplify the notation and to prevent the confusion let the random vector $\boldsymbol{X}$ have the same distribution as $\boldsymbol{X}_{1}$ . With the help of a second-order Taylor series expansion of the right-hand side one gets

[TABLE]

where

[TABLE]

and the point $u_{j\mathbf{x}}$ lies between the points $F_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u_{j},\mathbf{x},0)\big{)}$ and $u_{j}$ . Now using Lemma 1 (iv) in Appendix C for $j=1,2$

[TABLE]

uniformly in $(u_{1},u_{2})$ .

To conclude the proof of (A3) we need to show that ‘the second order terms’ in (A21) are asymptotically negligible. To show that note that by assumption $\mathbf{(F_{\boldsymbol{\varepsilon}})}$ and Lemma 1 (iii) there exists a finite constant $M$ such that with probability going to one

[TABLE]

uniformly in $(u_{1},u_{2})\in[0,1]^{2}$ and $\mathbf{x}\in\mathbf{J}_{n}$ .

Thus to prove

[TABLE]

it is sufficient to use once more Lemma 1 (iii).

A3: Showing \texorpdfstring(A4)(A4)

Recall that $W_{n}=\sum_{i=1}^{n}w_{ni}$ and decompose

[TABLE]

where $B_{n1}(u_{1},u_{2})$ stands for the first term on the right-hand side of the equation (A22) (except for the factor $\frac{n}{W_{n}}-1$ ) and $B_{n2}(u_{1},u_{2})$ for the second term. Further using standard techniques one can show that both $B_{n1}(u_{1},u_{2})$ and $B_{n2}(u_{1},u_{2})$ viewed as processes on $[0,1]^{2}$ are asymptotically equi-continuous. To this end, note that $B_{n1}(u_{1},u_{2})$ corresponds to the process $\bar{Z}_{n}(f)$ as defined in Section A1 above with $f\hat{=}\big{(}c_{n},F_{1\varepsilon}^{-1}(u_{1}),F_{2\varepsilon}^{-1}(u_{2}),0,1,0,1\big{)}$ . Alternatively, results by Bickel and Wichura, (1971) can be applied. Moreover as

[TABLE]

one can conclude that both processes $\big{(}\tfrac{n}{W_{n}}-1\big{)}\,B_{n1}(u_{1},u_{2})$ and $B_{n2}(u_{1},u_{2})$ are uniformly asymptotically negligible in probability, which together with (A22) implies (A4).

Appendix B - Proof of Theorem 2

Thanks to assumption $\mathbf{(\boldsymbol{\phi})}$ , the estimator $\boldsymbol{\widehat{\theta}}_{n}$ is a solution to the estimating equations (11).

In what follows, first we prove the existence of a consistent root of the estimating equations (11) and then we derive that this root satisfies

[TABLE]

where $(\widehat{U}_{1i},\widehat{U}_{2i})$ are introduced in (13). The statement of the theorem now follows for $p=1$ by Proposition A 1(ii) of Genest et al., (1995) and for $p>1$ by Theorem 1 of Gijbels et al., (2017).

Proving consistency

Put $\widetilde{C}_{n}^{\prime}(u_{1},u_{2})=\frac{1}{W_{n}}\sum_{i=1}^{n}w_{ni}\,\mathbbm{1}\big{\{}\widetilde{U}_{1i}\leq u_{1},\widetilde{U}_{2i}\leq u_{2}\big{\}},$ where pseudo-observations $(\widetilde{U}_{1i},\widetilde{U}_{2i})$ are defined in (10). Note that

[TABLE]

Fix $l\in\{1,\dotsc,p\}$ . By Corollary A.7 of Berghaus et al., (2017) one gets

[TABLE]

Note that thanks to assumption $\mathbf{(\boldsymbol{\phi}^{(j)})}$ (uniformly in $\mathbf{t}\in V(\boldsymbol{\theta})$ )

[TABLE]

and analogously also

[TABLE]

Now combining (B3), (B4) and (B5) yields

[TABLE]

where

[TABLE]

Analogously one gets

[TABLE]

Now using (B2), (B6), (B8) and assumption $\mathbf{(\boldsymbol{\phi})}$ gives that uniformly in $\mathbf{t}\in V(\boldsymbol{\theta})$

[TABLE]

where we have used Theorem 1 and assumption $\mathbf{(\boldsymbol{\phi})}$ . The existence of a consistent root of estimating equations (11) now follows by assumptions $\mathbf{(Id)}$ and $\mathbf{(\boldsymbol{\Gamma})}$ .

Analogously one can show the existence of a consistent root of estimating equations (12).

Showing \texorpdfstring(B1)(B1)

Let $\boldsymbol{\widehat{\theta}}_{n}$ be a consistent root of the estimating equations (11). Then by the mean value theorem applied to each coordinate of the vector-valued function

[TABLE]

one gets

[TABLE]

where $\mathbb{D}_{\boldsymbol{\phi}}$ stands for $\tfrac{\partial\boldsymbol{\phi}(u_{1},u_{2};\mathbf{t})}{\partial\mathbf{t}}$ and $\boldsymbol{\theta}_{n}^{*}$ is between $\boldsymbol{\widehat{\theta}}_{n}$ and $\boldsymbol{\theta}$ . Note that as the mean value theorem is applied to a vector valued function there are in fact $p$ different points $\boldsymbol{\theta}_{n}^{*,1},\dotsc,\boldsymbol{\theta}_{n}^{*,p}$ for each coordinate of the function $\boldsymbol{\Psi}_{n}(\mathbf{t})$ but all of them are consistent so for simplicity of notation we do not distinguish them.

Thus to finish the proof of (B1) it is sufficient to show that

[TABLE]

and

[TABLE]

When proving (B9) one can mimic the proof of consistency of $\boldsymbol{\widehat{\theta}}_{n}$ and show that there exists $V(\boldsymbol{\theta})$ (an open neighbourhood of $\boldsymbol{\theta}$ such that)

[TABLE]

Using the consistency of $\boldsymbol{\widehat{\theta}}_{n}$ and assumption $\mathbf{(\boldsymbol{\Gamma})}$ yields (B9).

Thus one can concentrate on proving (B10). Put $C_{n}^{\prime(or)}(u_{1},u_{2})=\frac{1}{n}\sum_{i=1}^{n}\mathbbm{1}\big{\{}\widehat{U}_{1i}\leq u_{1},\widehat{U}_{2i}\leq u_{2}\big{\}},$ where $(\widehat{U}_{1i},\widehat{U}_{2i})$ are defined in (13). Note that

[TABLE]

Analogously as (B6) one can also show that for $l=1,\dotsc,p$

[TABLE]

where $A_{l}(\boldsymbol{\theta})$ is given in (B7).

Now using (B2), (B6), (B11), (B12), Theorem 1 and $\mathbf{(\boldsymbol{\phi})}$ one gets

[TABLE]

which verifies (B10) and finishes the proof of (B1).

Appendix C - Auxiliary results

Lemma 1.

Assume that $\mathbf{(\boldsymbol{\beta})}$ , $\mathbf{(F_{\boldsymbol{\varepsilon}})}$ , $\mathbf{(M)}$ , $\mathbf{(F_{\boldsymbol{X}})}$ , $\mathbf{(Bw)}$ , $\mathbf{(k)}$ , $\mathbf{(J_{n})}$ and $\mathbf{(m\boldsymbol{\sigma})}$ are satisfied. Then there exist random functions $\widehat{a}_{j}$ and $\widehat{b}_{j}$ on $\mathbf{J}_{n}$ such that for $j=1,2$

(i)

$\displaystyle\sup_{\mathbf{x}\in\mathbf{J}_{n}}\Big{|}\frac{\widehat{m}_{j}(\mathbf{x})-m_{j}(\mathbf{x})}{\sigma_{j}(\mathbf{x})}-\widehat{a}_{j}(\mathbf{x})\Big{|}=o_{P}\big{(}n^{-1/2}\big{)},\ \sup_{\mathbf{x}\in\mathbf{J}_{n}}\Big{|}\frac{\widehat{\sigma}_{j}(\mathbf{x})}{\sigma_{j}(\mathbf{x})}-\widehat{b}_{j}(\mathbf{x})\Big{|}=o_{P}\big{(}n^{-1/2}\big{)},$ ** 2. (ii)

$\displaystyle\|\widehat{a}_{j}\|_{d+\delta}=o_{P}(1),\quad\|\widehat{b}_{j}-1\|_{d+\delta}=o_{P}(1)$ * for $\delta>0$ from assumption $\mathbf{(Bw)}$ ,* 3. (iii)

$\displaystyle\sup_{\mathbf{x}\in\mathbf{J}_{n}}\big{|}\widehat{a}_{j}(\mathbf{x})\big{|}=o_{P}\big{(}n^{-1/4}\big{)},\qquad\sup_{\mathbf{x}\in\mathbf{J}_{n}}\big{|}\widehat{b}_{j}(\mathbf{x})-1\big{|}=o_{P}\big{(}n^{-1/4}\big{)},$ ** 4. (iv)

$\displaystyle\int_{\mathbf{J}_{n}}\widehat{a}_{j}(\mathbf{x})f_{\boldsymbol{X}}(\mathbf{x})\,d\mathbf{x}=\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{ji}+o_{P}\big{(}n^{-1/2}\big{)}$ ,

$\displaystyle\int_{\mathbf{J}_{n}}\Big{(}\widehat{b}_{j}(\mathbf{x})-1\Big{)}\,f_{\boldsymbol{X}}(\mathbf{x})\,d\mathbf{x}=\frac{1}{2n}\sum_{i=1}^{n}\big{(}\varepsilon_{ji}^{2}-1\big{)}+o_{P}\big{(}n^{-1/2}\big{)}.$ **

Proof.

For ease of presentation we set $j=1$ and assume $\mathbf{h}_{n}=\big{(}h_{n},\dotsc,h_{n}\big{)}$ . We will first prove the assertions for $\widehat{m}_{1}$ . The proof basically goes along the lines of the proof of Lemma 1 by Müller et al., (2009), but changes are necessary due to the dependency of observations in our model and because our covariate density is not assumed to be bounded away from zero on its support. Recall that $\mathbb{I}(d,p)$ denotes the set of multi-indices $\mathbf{i}=(i_{1},\dotsc,i_{d})$ with $i.=i_{1}+\dots+i_{d}\leq p$ and we set $\mathbb{I}=\mathbb{I}(d,p)$ , where $p$ is the order of the polynomials used in the local polynomial estimation. Further introduce $\mathbf{J}_{n}^{+}=[-c_{n}-h_{n},c_{n}+h_{n}]^{d}$ and note thanks to assumption $\mathbf{(Bw)}$

[TABLE]

as for all sufficiently large $n$ the set $\mathbf{J}_{n}^{+}$ is a subset of $\mathbf{J}_{2n}$ . Finally define $\alpha_{n}^{(2)}:=\min_{j=1,2}\inf_{\mathbf{x}\in\mathbf{J}_{n}}\sigma_{j}(\mathbf{x})$ which is by assumption $\mathbf{(m\boldsymbol{\sigma})}$ either bounded away from zero or converges to zero not faster than a negative power of $\log n$ .

Proof of assertion (i) for $\widehat{m}_{1}$ . Fix some $\mathbf{x}\in\mathbf{J}_{n}$ and let $\widehat{\boldsymbol{\beta}}$ denote the solution of the minimization problem (4). Then $\widehat{\boldsymbol{\beta}}$ satisfies the normal equations

[TABLE]

where

[TABLE]

From Theorem 2 in Hansen, (2008) we obtain for $\varrho_{n}=\big{(}\log n/(nh_{n}^{d})\big{)}^{1/2}$ ,

[TABLE]

where we define $Q_{\mathbf{i}\mathbf{k}}(\mathbf{x})=\mathsf{E}\,\big{[}\widehat{Q}_{\mathbf{i}\mathbf{k}}(\mathbf{x})\big{]}$ , $\mathbf{i},\mathbf{k}\in\mathbb{I}$ . Note that

[TABLE]

and for $\mathbf{x}\in\mathbf{J}_{n}$ , consider the matrices $\mathbf{Q}(\mathbf{x})$ with entries $Q_{\mathbf{i}\mathbf{k}}(\mathbf{x})$ , $\mathbf{i},\mathbf{k}\in\mathbb{I}$ . Analogously put $\widehat{\mathbf{Q}}(\mathbf{x})$ for the matrix with entries $\widehat{Q}_{\mathbf{i}\mathbf{k}}(\mathbf{x})$ , $\mathbf{i},\mathbf{k}\in\mathbb{I}$ .

It follows from (C1) that $0<\lambda_{n}\leq\mathbf{a}^{\mathsf{T}}\mathbf{Q}(\mathbf{x})\,\mathbf{a}\leq\Lambda<\infty$ for all vectors $\mathbf{a}$ of unit Euclidean length, where $\lambda_{n}$ is a sequence of positive real numbers of the same rate as $\alpha_{n}^{(1)}$ in (C1). Thus $\mathbf{Q}(\mathbf{x})$ has eigenvalues in the interval $[\lambda_{n},\Lambda]$ , and on the event

[TABLE]

one has $\mathbf{a}^{\mathsf{T}}\widehat{\mathbf{Q}}(\mathbf{x})\,\mathbf{a}\geq\lambda_{n}/2$ for all $\mathbf{a}$ of unit Euclidean length, such that the matrix $\widehat{\mathbf{Q}}(\mathbf{x})$ is invertible as well. Here and throughout $\|\mathbf{Q}\|$ denotes the spectral norm of a matrix $\mathbf{Q}$ . Note that $\mathsf{P}(E_{n})\to 1$ by (C2) and $\varrho_{n}=o\big{(}\alpha_{n}^{(1)}\big{)}$ , which holds under assumption $\mathbf{(Bw)}$ . For the remainder of the proof we assume that the event $E_{n}$ takes place because its complement does not matter for the assertions of the lemma. It follows from the normal equations that for $\mathbf{x}\in\mathbf{J}_{n}$ ,

[TABLE]

where $\mathbf{e}_{1}=(1,0,\dotsc,0)^{\mathsf{T}}$ and $\mathbf{A}(\mathbf{x})$ and $\mathbf{B}(\mathbf{x})$ denote the vectors with components $A_{\mathbf{i}}(\mathbf{x})$ and $B_{\mathbf{i}}(\mathbf{x})$ , $\mathbf{i}\in\mathbb{I}$ , respectively. Now define

[TABLE]

then we have the decomposition

[TABLE]

with remainder terms

[TABLE]

where $\overline{\boldsymbol{\beta}}(\mathbf{x})$ is the vector with components $\overline{\beta}_{\mathbf{i}}(\mathbf{x})=h_{n}^{i.}\,D^{\mathbf{i}}m_{1}(\mathbf{x})$ , $\mathbf{i}\in\mathbb{I}$ . From Theorem 2 in Hansen, (2008) we obtain

[TABLE]

For the treatment of the inverse matrices in $r_{1}(\mathbf{x})$ we use Cramer’s rule and write

[TABLE]

where $\widehat{\mathbb{C}}(\mathbf{x})$ and ${\mathbb{C}}(\mathbf{x})$ denote the cofactor matrices of $\widehat{\mathbf{Q}}(\mathbf{x})$ and ${\mathbf{Q}}(\mathbf{x})$ , respectively. Due to the boundedness of the functions $Q_{\mathbf{i}\mathbf{k}}$ each element of $\widehat{\mathbb{C}}(\mathbf{x})-{\mathbb{C}}(\mathbf{x})$ can be absolutely bounded by $O_{P}(\varrho_{n})$ by (C2) and the same rate is obtained for $\big{|}\det\big{(}{\mathbf{Q}}(\mathbf{x})\big{)}-\det\big{(}\widehat{\mathbf{Q}}(\mathbf{x})\big{)}\big{|}$ , uniformly in $\mathbf{x}$ . Using the lower bound $\lambda_{n}^{|\mathbb{I}|}$ for the determinant of $\mathbf{Q}(\mathbf{x})$ , and assumption $\mathbf{(m\boldsymbol{\sigma})}$ to bound $1/\sigma_{1}$ gives the rate

[TABLE]

by assumption $\mathbf{(Bw)}$ . In order to show negligibility of $r_{2}(\mathbf{x})$ first note that the spectral norm of $\widehat{\mathbf{Q}}^{-1}(\mathbf{x})$ is given by the reciprocal of the square root of the smallest eigenvalue of $\widehat{\mathbf{Q}}(\mathbf{x})^{\mathsf{T}}\widehat{\mathbf{Q}}(\mathbf{x})$ . With

[TABLE]

(on $E_{n}$ ) for all $\mathbf{a}$ with $||\mathbf{a}||=1$ , we obtain the rate $O(\lambda_{n}^{-1})$ for $\big{\|}\widehat{\mathbf{Q}}^{-1}(\mathbf{x})\big{\|}$ . Further, by Taylor expansion of $m_{1}(\boldsymbol{X}_{\ell})$ of order $p+1$ in the definition of $B_{\mathbf{i}}(\mathbf{x})$ and using assumption $\mathbf{(m\boldsymbol{\sigma})}$ we have

[TABLE]

where the kernel density estimator $\widehat{f}_{\boldsymbol{X}}(\mathbf{x})=\frac{1}{n}\sum_{\ell=1}^{n}K_{\mathbf{h}_{n}}(\boldsymbol{X}_{\ell}-\mathbf{x})$ converges to $f_{\boldsymbol{X}}(\mathbf{x})$ uniformly in $\mathbf{x}\in\mathbf{J}_{n}$ , see Theorem 6 by Hansen, (2008). Altogether we have

[TABLE]

using assumption $\mathbf{(Bw)}$ .

Now assertion (i) for $\widehat{m}_{1}$ follows from (C3), (C4), (C6) and (C7).

Proof of assertion (ii) for $\widehat{a}_{1}$ . Note that $p\geq d$ and thus $\widehat{a}_{1}$ is ( $d+1$ )-times partially differentiable and

[TABLE]

by the mean value theorem. Again by Theorem 2 of Hansen, (2008) we have

[TABLE]

Further note that

[TABLE]

and that the spectral norm of $\mathbf{Q}^{-1}(\mathbf{x})$ can be bounded by $O\big{(}1/\alpha_{n}^{(1)}\big{)}$ with considerations as before. We apply the product rule for derivatives to obtain

[TABLE]

by assumption $\mathbf{(Bw)}$ . Assertion (ii) for $\widehat{a}_{1}$ follows.

Proof of assertion (iii) for $\widehat{a}_{1}$ . From the definition of $\widehat{a}_{1}$ and (C5) we obtain that

[TABLE]

and thus (iii) follows for $\widehat{a}_{1}$ .

Proof of assertion (iv) for $\widehat{a}_{1}$ . To prove (iv) note that

[TABLE]

with

[TABLE]

From the support properties of the kernel function it follows that $\Delta_{n}(\boldsymbol{X}_{i})\mathbbm{1}\{\boldsymbol{X}_{i}\not\in\mathbf{J}_{n}^{+}\}=0$ . Further, for $\mathbf{J}_{n}^{-}=[-c_{n}+h_{n},c_{n}-h_{n}]^{d}$ note that

[TABLE]

because the expectation is zero and the variance is bounded by

[TABLE]

It remains to consider

[TABLE]

with $\Delta_{n}(\boldsymbol{X}_{i})=\Delta_{n}^{(1)}(\boldsymbol{X}_{i})+\Delta_{n}^{(2)}(\boldsymbol{X}_{i})$ , where

[TABLE]

Now, by applying the mean value theorem for $\sigma_{1}$ , for $\boldsymbol{X}_{i}\in\mathbf{J}_{n}^{-}$ , $\Delta_{n}^{(2)}(\boldsymbol{X}_{i})$ can be bounded absolutely by $O\big{(}M_{n}\,h_{n}/(\alpha_{n}^{(1)}\alpha_{n}^{(2)})\big{)}=o(1)$ . Thus analogously as when showing (C8) one can use Markov’s inequality to get

[TABLE]

To obtain the desired negligibility it remains to show $\mathsf{E}\big{[}\big{(}\Delta_{n}^{(1)}(\boldsymbol{X}_{i})-1\big{)}^{2}\mathbbm{1}\{\boldsymbol{X}_{i}\in\mathbf{J}_{n}^{-}\}\big{]}\to 0$ . To this end we write

[TABLE]

where the matrix $\mathbf{Q}_{*}(\mathbf{x})$ has entries

[TABLE]

Note that $\mathbf{Q}_{*}(\mathbf{x})$ has the smallest eigenvalue of order $\lambda_{n}$ . Thus we obtain the bound

[TABLE]

Now with bounds for the matrix norms similar to before, and inserting the definitions of $\mathbf{Q}$ and $\mathbf{Q}_{*}$ we obtain

[TABLE]

by the mean value theorem and assumptions $\mathbf{(F_{X})}$ and $\mathbf{(k)}$ .

Proof of assertions (i)–(iv) for $\widehat{\sigma}_{1}$ . Recall the definition $\widehat{\sigma}_{1}^{2}=\widehat{s}_{1}-\widehat{m}_{1}^{2}$ , where $\widehat{s}_{1}$ is the local polynomial estimator based on $(\boldsymbol{X}_{i},Y_{1i}^{2})$ , $i=1,\dotsc,n$ . With the notation $s_{1}(\mathbf{x})=\mathsf{E}[Y_{1i}^{2}\mid\boldsymbol{X}_{i}=\mathbf{x}]=\sigma_{1}^{2}(\mathbf{x})+m_{1}^{2}(\mathbf{x})$ we obtain

[TABLE]

where

[TABLE]

Put

[TABLE]

where $\widetilde{\mathbf{A}}(\mathbf{x})$ denotes the vector with components

[TABLE]

Along the lines of the proof of (i) and (ii) for $\widehat{m}_{1}$ one can prove that

[TABLE]

and

[TABLE]

Now noticing that $\widehat{\sigma}_{1}^{2}(\mathbf{x})-\sigma_{1}^{2}(\mathbf{x})=\widehat{s}_{1}(\mathbf{x})-s_{1}(\mathbf{x})-(\widehat{m}_{1}(\mathbf{x})-m_{1}(\mathbf{x}))(\widehat{m}_{1}(\mathbf{x})+m_{1}(\mathbf{x}))$ we obtain the rate

[TABLE]

and (i) follows for $\widehat{\sigma}_{1}$ .

If we define $\widehat{b}_{1}(\mathbf{x})=1+\widehat{c}_{1}(\mathbf{x})-\widehat{a}_{1}(\mathbf{x})m_{1}(\mathbf{x})/\sigma_{1}(\mathbf{x})$ , then (ii) and (iii) follow analogously to before. The only difference is an additional factor $\sigma_{1}(\mathbf{x})$ in the denominator that needs to be considered.

To show validity of (iv) note that the regression model $Y_{1i}^{2}=s_{1}(\boldsymbol{X}_{i})+\eta_{i}$ holds with error term $\eta_{i}=\sigma_{i}^{2}(\boldsymbol{X}_{i})(\varepsilon_{1i}^{2}-1)+2m_{1}(\boldsymbol{X}_{i})\sigma_{1}(\boldsymbol{X}_{i})\varepsilon_{1i}$ . From this one obtains analogously to the derivation of (iv) for $\widehat{a}_{1}$ that

[TABLE]

But the second sum is also the dominating term in $\int_{\mathbf{J}_{n}}\widehat{a}_{1}(\mathbf{x})m_{1}(\mathbf{x})/\sigma_{1}(\mathbf{x})f_{\boldsymbol{X}}(\mathbf{x})\,d\mathbf{x}$ , which is again shown analogously to the proof of (iv) for $\widehat{a}_{1}$ . Thus (iv) follows for $\widehat{b}_{1}$ . ∎

*Remark 5**.*

Note that due to property (iii) of Lemma 1 and (C1) we have for $\mathbf{x}\in\mathbf{J}_{n}=[-c_{n},c_{n}]^{d}$ ,

[TABLE]

for every $\nu>0$ . In the proof of Lemma 1, $\widehat{a}_{1}(\mathbf{x})$ was only defined for $\mathbf{x}\in\mathbf{J}_{n}$ . Now we define $\widehat{a}_{1}$ on $\mathbb{R}^{d}$ in a way that if $\widehat{a}_{1}\in C_{1}^{d+\delta}(\mathbf{J}_{n})$ and $\|\mathbf{x}\|^{\nu}|\widehat{a}_{1}(\mathbf{x})|\leq 1$ , then $\widehat{a}_{1}\in\mathcal{G}$ defined in (A10). Then $\mathsf{P}\big{(}\widehat{a}_{1}\in\mathcal{G}\big{)}\to 1$ by Lemma 1. Analogously $\widehat{b}_{1}$ is defined on $\mathbb{R}^{d}$ such that $\mathsf{P}\big{(}\widehat{b}_{1}\in\widetilde{\mathcal{G}}\big{)}\to 1$ for $\widetilde{\mathcal{G}}$ from (A11).

Lemma 2.

Let $\mathcal{H}=\mathcal{G}$ or $\widetilde{\mathcal{G}}$ denote one of the function classes defined in (A10) and (A11) (depending on $\nu>0$ and $\delta\in(0,1]$ ), then we have

[TABLE]

for $\epsilon\searrow 0$ , and thus the same bound holds for $\log N_{[\,]}\left(\epsilon,\mathcal{H},\|\cdot\|_{2}\right)$ .

Proof.

Let $\mathcal{H}=\mathcal{G}$ (the proof is similar for $\widetilde{\mathcal{G}}$ ) and let $\epsilon>0$ . Choose $D=D(\epsilon)=\epsilon^{-1/\nu}$ . Let $B$ denote the ball of radius $D$ around the origin. Let $a_{1},\dotsc,a_{m}:B\to\mathbb{R}$ denote the centers of $\epsilon$ -balls with respect to the supremum norm that cover $C_{1}^{d+\delta}(B)$ , that is $m=N(\epsilon,C_{1}^{d+\delta}(B),\|\cdot\|_{\infty})$ . Then for each $a\in\mathcal{G}$ we have $a|_{B}\in C_{1}^{d+\delta}(B)$ and thus there exists $j_{0}\in\{1,\dotsc,m\}$ such that $\sup_{\mathbf{x}\in B}|a(\mathbf{x})-a_{j_{0}}(\mathbf{x})|\leq\epsilon$ . Now define $a_{j}(\mathbf{x})=0$ for $\mathbf{x}\in\mathbb{R}^{d}\setminus B$ , $j=1,\dotsc,m$ . Then

[TABLE]

because $\|\mathbf{x}\|^{\nu}|a(\mathbf{x})|\leq 1$ by definition of $\mathcal{G}$ . We obtain $N(\epsilon,\mathcal{G},\|\cdot\|_{\infty})\leq m$ and due to van der Vaart and Wellner, (1996), Theorem 2.7.1, we have for some universal $K$

[TABLE]

where $B^{1}=\big{\{}\mathbf{x}:\,\|\mathbf{x}-B\|<1\big{\}}$

Thus the first assertion follows. The second assertion follows by van der Vaart and Wellner, (1996), proof of Cor. 2.7.2. ∎

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Berghaus et al., (2017) Berghaus, B., Bücher, A., and Volgushev, S. (2017). Weak convergence of the empirical copula process with respect to weighted metrics. Bernoulli , 23(1):743–772.
2Bickel and Wichura, (1971) Bickel, P. J. and Wichura, M. J. (1971). Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Statist. , 42:1656–1670.
3Brahimi and Necir, (2012) Brahimi, B. and Necir, A. (2012). A semiparametric estimation of copula models based on the method of moments. Stat. Methodol. , 9(4):467–477.
4Bücher and Volgushev, (2013) Bücher, A. and Volgushev, S. (2013). Empirical and sequential empirical copula processes under serial dependence. J. Multivariate Anal. , 119:61–70.
5Chan et al., (2009) Chan, N.-H., Chen, J., Chen, X., Fan, Y., and Peng, L. (2009). Statistical inference for multivariate residual copula of GARCH models. Statist. Sinica , 19:53–70.
6Chen and Fan, (2006) Chen, X. and Fan, Y. (2006). Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J. Econometrics , 135:125–154.
7Dedecker and Louhichi, (2002) Dedecker, J. and Louhichi, S. (2002). Maximal inequalities and empirical central limit theorems. In Dehling, H., Mikosch, T., and Sorensen, M., editors, Empirical process techniques for dependent data , pages 137–160. Birkhäuser Boston.
8Dette et al., (2009) Dette, H., Pardo-Fernández, J. C., and Keilegom, I. V. (2009). Goodness-of-fit tests for multiplicative models with dependent data. Scand. J. Statist. , 36(4):782–799.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A copula approach for dependence modeling in multivariate nonparametric time series

Abstract.

1. Introduction

2. Main results

2.1. Empirical copula estimation

Regularity assumptions

Remark 1*.*

Remark 2*.*

Remark 3*.*

Remark 4*.*

Theorem 1**.**

2.2. Semiparametric copula estimation

2.2.1. Method-of-Moments using rank correlation

2.2.2. Minimum distance estimation

2.2.3. M-estimator, rank approximate Z-estimators

Regularity assumptions

Theorem 2**.**

2.3. Goodness-of-fit testing

3. Simulation study

4. Application

Acknowledgement

Appendix A - Proof of Theorem 1

A1: Decomposition and weak convergence of \texorpdfstringG^n\widehat{\mathbb{G}}_{n}Gn​Gn

A2: Showing \texorpdfstring(A3)(A3)

Showing (A19)

Calculating E ⁣∗[Gˇn(u1,u2)]\mathsf{E}\,\!^{\ast}[\check{\mathbb{G}}_{n}(u_{1},u_{2})]E∗[Gˇn​(u1​,u2​)]

A3: Showing \texorpdfstring(A4)(A4)

Appendix B - Proof of Theorem 2

Proving consistency

Showing \texorpdfstring(B1)(B1)

Appendix C - Auxiliary results

Lemma 1**.**

Proof.

Remark 5*.*

Lemma 2**.**

Proof.

*Remark 1**.*

*Remark 2**.*

*Remark 3**.*

*Remark 4**.*

Theorem 1.

Theorem 2.

A1: Decomposition and weak convergence of \texorpdfstring $\widehat{\mathbb{G}}_{n}$ Gn

Calculating $\mathsf{E}\,\!^{\ast}[\check{\mathbb{G}}_{n}(u_{1},u_{2})]$

Lemma 1.

*Remark 5**.*

Lemma 2.