Maximum pseudo-likelihood estimation based on estimated residuals in   copula semiparametric models

Marek Omelka; \v{S}\'arka Hudecov\'a; Natalie Neumeyer

arXiv:1903.04221·math.ST·March 12, 2019

Maximum pseudo-likelihood estimation based on estimated residuals in copula semiparametric models

Marek Omelka, \v{S}\'arka Hudecov\'a, Natalie Neumeyer

PDF

TL;DR

This paper investigates the maximum pseudo-likelihood estimation of copula models with residual-based data, demonstrating asymptotic equivalence to unobserved error-based estimators under certain conditions, and exploring limitations via simulations.

Contribution

It establishes the asymptotic properties of residual-based pseudo-likelihood estimators in copula models and examines their performance when regularity conditions fail.

Findings

01

Residual-based estimator is asymptotically equivalent to error-based estimator under regularity.

02

Simulation shows poor behavior of pseudo-likelihood estimator when assumptions are violated.

03

Moment estimation of copula parameters can be preferable in irregular cases.

Abstract

This paper deals with a situation when one is interested in the dependence structure of a multidimensional response variable in the presence of a multivariate covariate. It is assumed that the covariate affects only the marginal distributions through regression models while the dependence structure, which is described by a copula, is unaffected. A parametric estimation of the copula function is considered with focus on the maximum pseudo-likelihood method. It is proved that under some appropriate regularity assumptions the estimator calculated from the residuals is asymptotically equivalent to the estimator based on the unobserved errors. In such case one can ignore the fact that the response is first adjusted for the effect of the covariate. A Monte Carlo simulation study explores (among others) situations where the regularity assumptions are not satisfied and the claimed result does…

Tables3

Table 1. Table 1. Model ( 12 ) with Clayton copula, quantities multiplied by 100.

$τ$	margins	estim	$n = 100$			$n = 1 000$			$n = 10 000$
			bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
0.50	inov	${\tilde{α}}^{(i k)}$	$-$ 0.03	5.54	5.54	0.00	1.64	1.64	$-$ 0.01	0.53	0.53
		${\tilde{α}}^{(p l)}$	0.33	4.90	4.91	0.01	1.49	1.49	0.00	0.48	0.48
	N+E	${\hat{α}}^{(i k)}$	$-$ 1.25	5.62	5.76	$-$ 0.27	1.64	1.67	$-$ 0.05	0.53	0.53
		${\hat{α}}^{(p l)}$	$-$ 3.91	5.54	6.78	$-$ 2.26	2.08	3.08	$-$ 0.80	0.75	1.10
		${\hat{α}}^{(p l *)}$	$-$ 1.94	5.30	5.65	$-$ 1.23	1.81	2.19	$-$ 0.44	0.63	0.77
	N+U	${\hat{α}}^{(i k)}$	$-$ 0.21	5.55	5.55	$-$ 0.03	1.63	1.63	$-$ 0.02	0.53	0.53
		${\hat{α}}^{(p l)}$	$-$ 0.84	4.86	4.93	$-$ 0.61	1.53	1.65	$-$ 0.22	0.51	0.55
		${\hat{α}}^{(p l *)}$	0.02	5.00	5.00	$-$ 0.13	1.50	1.51	$-$ 0.05	0.49	0.49
	t	${\hat{α}}^{(i k)}$	$-$ 0.15	5.58	5.58	$-$ 0.01	1.64	1.64	$-$ 0.02	0.53	0.53
		${\hat{α}}^{(p l)}$	0.10	4.96	4.96	$-$ 0.02	1.50	1.50	$-$ 0.01	0.48	0.48
		${\hat{α}}^{(p l *)}$	0.38	5.05	5.06	0.06	1.51	1.51	0.02	0.48	0.48
0.75	inov	${\tilde{α}}^{(i k)}$	0.02	3.40	3.40	$-$ 0.01	1.01	1.01	0.01	0.31	0.31
		${\tilde{α}}^{(p l)}$	$-$ 0.77	3.12	3.21	$-$ 0.16	0.93	0.94	$-$ 0.01	0.28	0.28
	N+E	${\hat{α}}^{(i k)}$	$-$ 2.14	3.70	4.27	$-$ 0.48	1.08	1.18	$-$ 0.07	0.32	0.33
		${\hat{α}}^{(p l)}$	$-$ 9.19	5.85	10.89	$-$ 4.19	2.88	5.09	$-$ 1.57	1.14	1.94
		${\hat{α}}^{(p l *)}$	$-$ 6.26	4.95	7.98	$-$ 2.86	2.36	3.71	$-$ 1.07	0.94	1.43
	N+U	${\hat{α}}^{(i k)}$	$-$ 0.24	3.39	3.40	$-$ 0.06	1.01	1.01	0.00	0.31	0.31
		${\hat{α}}^{(p l)}$	$-$ 2.99	3.27	4.43	$-$ 1.22	1.18	1.70	$-$ 0.44	0.41	0.60
		${\hat{α}}^{(p l *)}$	$-$ 1.63	3.15	3.55	$-$ 0.60	1.01	1.17	$-$ 0.20	0.33	0.39
	t	${\hat{α}}^{(i k)}$	$-$ 0.22	3.45	3.45	$-$ 0.05	1.01	1.01	0.01	0.31	0.31
		${\hat{α}}^{(p l)}$	$-$ 1.21	3.21	3.43	$-$ 0.22	0.93	0.95	$-$ 0.02	0.28	0.28
		${\hat{α}}^{(p l *)}$	$-$ 1.04	3.24	3.40	$-$ 0.17	0.93	0.95	$-$ 0.01	0.28	0.28

Table 2. Table 2. Model ( 12 ) with Frank copula, quantities multiplied by 100.

$τ$	margins	estim	$n = 100$			$n = 1 000$			$n = 10 000$
			bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
0.50	inov	${\tilde{α}}^{(i k)}$	$-$ 0.03	4.62	4.62	0.01	1.44	1.43	0.01	0.45	0.45
		${\tilde{α}}^{(p l)}$	$-$ 0.03	4.51	4.50	0.01	1.42	1.42	0.01	0.45	0.45
	N+E	${\hat{α}}^{(i k)}$	$-$ 0.45	4.68	4.70	$-$ 0.05	1.44	1.44	0.00	0.45	0.45
		${\hat{α}}^{(p l)}$	$-$ 0.45	4.55	4.57	$-$ 0.05	1.43	1.43	0.00	0.45	0.45
		${\hat{α}}^{(p l *)}$	$-$ 0.21	4.84	4.84	$-$ 0.04	1.46	1.46	0.00	0.45	0.45
	N+U	${\hat{α}}^{(i k)}$	$-$ 0.08	4.65	4.65	0.00	1.44	1.43	0.00	0.45	0.45
		${\hat{α}}^{(p l)}$	$-$ 0.08	4.53	4.53	0.00	1.42	1.42	0.01	0.45	0.45
		${\hat{α}}^{(p l *)}$	0.09	4.85	4.85	0.01	1.45	1.45	0.01	0.45	0.45
0.75	inov	${\tilde{α}}^{(i k)}$	$-$ 0.11	2.50	2.50	0.00	0.74	0.74	0.00	0.23	0.23
		${\tilde{α}}^{(p l)}$	$-$ 0.53	2.45	2.50	$-$ 0.06	0.74	0.74	0.00	0.23	0.22
	N+E	${\hat{α}}^{(i k)}$	$-$ 1.17	2.79	3.02	$-$ 0.14	0.76	0.77	$-$ 0.01	0.23	0.23
		${\hat{α}}^{(p l)}$	$-$ 1.59	2.77	3.19	$-$ 0.19	0.76	0.78	$-$ 0.02	0.23	0.23
		${\hat{α}}^{(p l *)}$	$-$ 1.42	2.90	3.23	$-$ 0.17	0.77	0.79	$-$ 0.01	0.23	0.23
	N+U	${\hat{α}}^{(i k)}$	$-$ 0.25	2.53	2.54	$-$ 0.01	0.74	0.74	0.00	0.23	0.23
		${\hat{α}}^{(p l)}$	$-$ 0.69	2.50	2.59	$-$ 0.07	0.74	0.74	0.00	0.23	0.23
		${\hat{α}}^{(p l *)}$	$-$ 0.57	2.62	2.68	$-$ 0.05	0.76	0.76	0.00	0.23	0.23

Table 3. Table 3. Model ( 12 ) with Gaussian copula, quantities multiplied by 100.

$τ$	margins	estim	$n = 100$			$n = 1 000$			$n = 10 000$
			bias	SD	RMSE	bias	SD	RMSE	bias	SD	RMSE
0.50	inov	${\tilde{α}}^{(i k)}$	0.03	4.94	4.94	$-$ 0.07	1.53	1.53	0.00	0.48	0.48
		${\tilde{α}}^{(p l)}$	1.07	4.51	4.63	0.10	1.39	1.40	0.03	0.44	0.44
	N+E	${\hat{α}}^{(i k)}$	$-$ 0.43	4.97	4.99	$-$ 0.17	1.53	1.54	$-$ 0.02	0.48	0.48
		${\hat{α}}^{(p l)}$	0.32	4.54	4.55	$-$ 0.21	1.41	1.43	$-$ 0.06	0.45	0.45
		${\hat{α}}^{(p l *)}$	0.99	4.90	5.00	0.08	1.46	1.46	0.03	0.45	0.45
	N+U	${\hat{α}}^{(i k)}$	$-$ 0.06	4.97	4.97	$-$ 0.08	1.53	1.53	0.00	0.48	0.48
		${\hat{α}}^{(p l)}$	0.87	4.53	4.62	$-$ 0.01	1.40	1.39	$-$ 0.01	0.44	0.44
		${\hat{α}}^{(p l *)}$	1.36	4.86	5.05	0.22	1.46	1.47	0.07	0.45	0.45
	t	${\hat{α}}^{(i k)}$	0.04	4.99	4.98	$-$ 0.07	1.53	1.53	0.00	0.48	0.48
		${\hat{α}}^{(p l)}$	1.08	4.55	4.67	0.09	1.40	1.40	0.02	0.44	0.44
		${\hat{α}}^{(p l *)}$	1.42	4.90	5.10	0.21	1.46	1.48	0.06	0.45	0.45
0.75	inov	${\tilde{α}}^{(i k)}$	0.16	2.79	2.80	$-$ 0.02	0.89	0.89	0.00	0.27	0.27
		${\tilde{α}}^{(p l)}$	0.06	2.53	2.53	$-$ 0.02	0.80	0.80	0.00	0.25	0.25
	N+E	${\hat{α}}^{(i k)}$	$-$ 1.02	2.93	3.10	$-$ 0.24	0.90	0.93	$-$ 0.04	0.27	0.27
		${\hat{α}}^{(p l)}$	$-$ 1.81	2.81	3.34	$-$ 0.73	0.95	1.20	$-$ 0.21	0.30	0.37
		${\hat{α}}^{(p l *)}$	$-$ 1.01	2.77	2.95	$-$ 0.40	0.88	0.97	$-$ 0.10	0.27	0.29
	N+U	${\hat{α}}^{(i k)}$	$-$ 0.08	2.80	2.80	$-$ 0.05	0.89	0.89	$-$ 0.01	0.27	0.27
		${\hat{α}}^{(p l)}$	$-$ 0.48	2.52	2.56	$-$ 0.27	0.82	0.86	$-$ 0.09	0.25	0.27
		${\hat{α}}^{(p l *)}$	0.00	2.61	2.60	$-$ 0.05	0.81	0.81	$-$ 0.01	0.25	0.25
	t	${\hat{α}}^{(i k)}$	0.14	2.82	2.82	$-$ 0.02	0.89	0.89	$-$ 0.01	0.27	0.27
		${\hat{α}}^{(p l)}$	0.03	2.56	2.56	$-$ 0.02	0.79	0.79	0.00	0.25	0.25
		${\hat{α}}^{(p l *)}$	0.20	2.62	2.62	0.02	0.80	0.80	0.02	0.25	0.25

Equations455

H_{\mathbf{x}}(y_{1},\ldots,y_{d})=\mathsf{P}(Y_{1}\leq y_{1},\ldots,Y_{d}\leq y_{d}\mid\boldsymbol{X}=\mathbf{x})=C\big{(}F_{1\mathbf{x}}(y_{1}),\ldots,F_{d\mathbf{x}}(y_{d})\big{)}

H_{\mathbf{x}}(y_{1},\ldots,y_{d})=\mathsf{P}(Y_{1}\leq y_{1},\ldots,Y_{d}\leq y_{d}\mid\boldsymbol{X}=\mathbf{x})=C\big{(}F_{1\mathbf{x}}(y_{1}),\ldots,F_{d\mathbf{x}}(y_{d})\big{)}

Y_{j} = m_{j} (X) + s_{j} (X) ε_{j}, where ε_{j} is independent with X .

Y_{j} = m_{j} (X) + s_{j} (X) ε_{j}, where ε_{j} is independent with X .

\sup_{(u_{1},u_{2})\in[0,1]^{2}}\sqrt{n}\,\big{|}\widehat{C}_{n}(u_{1},u_{2})-\widetilde{C}_{n}(u_{1},u_{2})\big{|}=o_{P}(1).

\sup_{(u_{1},u_{2})\in[0,1]^{2}}\sqrt{n}\,\big{|}\widehat{C}_{n}(u_{1},u_{2})-\widetilde{C}_{n}(u_{1},u_{2})\big{|}=o_{P}(1).

\sqrt{n}\,\big{(}\widehat{\boldsymbol{\alpha}}_{n}-\widetilde{\boldsymbol{\alpha}}_{n}\big{)}=o_{P}(1).

\sqrt{n}\,\big{(}\widehat{\boldsymbol{\alpha}}_{n}-\widetilde{\boldsymbol{\alpha}}_{n}\big{)}=o_{P}(1).

ε_{j} = \frac{T _{j} ( Y _{j} ) - m _{j} ( X ; θ _{j} )}{s _{j} ( X ; θ _{j} )},

ε_{j} = \frac{T _{j} ( Y _{j} ) - m _{j} ( X ; θ _{j} )}{s _{j} ( X ; θ _{j} )},

ε_{j i} = \frac{T _{j} ( Y _{j i} ) - m _{j} ( X _{i} ; θ _{j} )}{s _{j} ( X _{i} ; θ _{j} )}, i = 1, \dots, n; j = 1, \dots, d,

ε_{j i} = \frac{T _{j} ( Y _{j i} ) - m _{j} ( X _{i} ; θ _{j} )}{s _{j} ( X _{i} ; θ _{j} )}, i = 1, \dots, n; j = 1, \dots, d,

F_{j ε} (y) = \frac{1}{n} i = 1 \sum n 1 {ε_{j i} \leq y} .

F_{j ε} (y) = \frac{1}{n} i = 1 \sum n 1 {ε_{j i} \leq y} .

\widehat{\boldsymbol{\alpha}}_{n}=\operatorname*{arg\,max}_{\mathbf{a}\in\Theta}\sum_{i=1}^{n}\log\big{\{}c\big{(}\widehat{\mathbf{U}}_{i};\mathbf{a}\big{)}\big{\}},

\widehat{\boldsymbol{\alpha}}_{n}=\operatorname*{arg\,max}_{\mathbf{a}\in\Theta}\sum_{i=1}^{n}\log\big{\{}c\big{(}\widehat{\mathbf{U}}_{i};\mathbf{a}\big{)}\big{\}},

\widehat{\mathbf{U}}_{i}=\big{(}\widehat{U}_{1i},\dotsc,\widehat{U}_{di}\big{)}^{\top}=\tfrac{n}{n+1}\big{(}\widehat{F}_{1\widehat{\varepsilon}}(\widehat{\varepsilon}_{1i}),\dotsc,\widehat{F}_{d\widehat{\varepsilon}}(\widehat{\varepsilon}_{di})\big{)}^{\top}

\widehat{\mathbf{U}}_{i}=\big{(}\widehat{U}_{1i},\dotsc,\widehat{U}_{di}\big{)}^{\top}=\tfrac{n}{n+1}\big{(}\widehat{F}_{1\widehat{\varepsilon}}(\widehat{\varepsilon}_{1i}),\dotsc,\widehat{F}_{d\widehat{\varepsilon}}(\widehat{\varepsilon}_{di})\big{)}^{\top}

\sum_{i=1}^{n}\boldsymbol{\psi}(\widehat{\mathbf{U}}_{i};\widehat{\boldsymbol{\alpha}}_{n})=\mathbf{0}_{p},\quad\text{where}\quad\boldsymbol{\psi}(\mathbf{u};\mathbf{a})=\frac{\partial\log\{c\big{(}\mathbf{u};\mathbf{a}\big{)}\}}{\partial\mathbf{a}}.

\sum_{i=1}^{n}\boldsymbol{\psi}(\widehat{\mathbf{U}}_{i};\widehat{\boldsymbol{\alpha}}_{n})=\mathbf{0}_{p},\quad\text{where}\quad\boldsymbol{\psi}(\mathbf{u};\mathbf{a})=\frac{\partial\log\{c\big{(}\mathbf{u};\mathbf{a}\big{)}\}}{\partial\mathbf{a}}.

i = 1 \sum n ψ (U_{i}; α_{n}) = 0_{p},

i = 1 \sum n ψ (U_{i}; α_{n}) = 0_{p},

\widetilde{\mathbf{U}}_{i}=\big{(}\widetilde{U}_{1i},\dotsc,\widetilde{U}_{di}\big{)}^{\top}=\tfrac{n}{n+1}\big{(}\widehat{F}_{1\varepsilon}(\varepsilon_{1i}),\dotsc,\widehat{F}_{d\varepsilon}(\varepsilon_{di})\big{)}^{\top}

\widetilde{\mathbf{U}}_{i}=\big{(}\widetilde{U}_{1i},\dotsc,\widetilde{U}_{di}\big{)}^{\top}=\tfrac{n}{n+1}\big{(}\widehat{F}_{1\varepsilon}(\varepsilon_{1i}),\dotsc,\widehat{F}_{d\varepsilon}(\varepsilon_{di})\big{)}^{\top}

F_{j ε} (y) = \frac{1}{n} i = 1 \sum n 1 {ε_{j i} \leq y}, j = 1, \dots, d .

F_{j ε} (y) = \frac{1}{n} i = 1 \sum n 1 {ε_{j i} \leq y}, j = 1, \dots, d .

\sup_{u\in(0,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u)|\big{)}}{u^{\beta}(1-u)^{\beta}}<\infty

\sup_{u\in(0,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u)|\big{)}}{u^{\beta}(1-u)^{\beta}}<\infty

\sup_{u\in(0,1/2)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(2u)\big{)}}{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}}<\infty\quad\text{ and }\quad\sup_{u\in(1/2,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(1-2u)\big{)}}{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(1-u)\big{)}}<\infty.

\sup_{u\in(0,1/2)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(2u)\big{)}}{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}}<\infty\quad\text{ and }\quad\sup_{u\in(1/2,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(1-2u)\big{)}}{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(1-u)\big{)}}<\infty.

\sup_{u\in(0,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}}{u^{\beta}(1-u)^{\beta}}<\infty.

\sup_{u\in(0,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}}{u^{\beta}(1-u)^{\beta}}<\infty.

\sup_{u\in(0,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\,\big{|}F_{j\varepsilon}^{-1}(u)\big{|}}{u^{\beta}(1-u)^{\beta}}<\infty.

\sup_{u\in(0,1)}\frac{f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\,\big{|}F_{j\varepsilon}^{-1}(u)\big{|}}{u^{\beta}(1-u)^{\beta}}<\infty.

∣ x ∣ \to \infty lim ∣ x ∣ f_{j ε} (x) = 0.

∣ x ∣ \to \infty lim ∣ x ∣ f_{j ε} (x) = 0.

\lim_{u\to 0_{+}}F_{j\varepsilon}^{-1}(u)=-\infty\quad\Big{(}\;\lim_{u\to 1_{-}}F_{j\varepsilon}^{-1}(u)=\infty\;\Big{)},

\lim_{u\to 0_{+}}F_{j\varepsilon}^{-1}(u)=-\infty\quad\Big{(}\;\lim_{u\to 1_{-}}F_{j\varepsilon}^{-1}(u)=\infty\;\Big{)},

\lim_{u\to 0_{+}}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u)|\big{)}=0\quad\Big{(}\;\lim_{u\to 1_{-}}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u)|\big{)}=0\;\Big{)}.

\lim_{u\to 0_{+}}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u)|\big{)}=0\quad\Big{(}\;\lim_{u\to 1_{-}}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}\big{(}1+|F_{j\varepsilon}^{-1}(u)|\big{)}=0\;\Big{)}.

\lim_{u\to 0_{+}}F_{j\varepsilon}^{-1}(u)>-\infty\quad\Big{(}\;\lim_{u\to 1_{-}}F_{j\varepsilon}^{-1}(u)<\infty\;\Big{)}.

\lim_{u\to 0_{+}}F_{j\varepsilon}^{-1}(u)>-\infty\quad\Big{(}\;\lim_{u\to 1_{-}}F_{j\varepsilon}^{-1}(u)<\infty\;\Big{)}.

\sup_{\mathbf{t}\in U(\boldsymbol{\theta}_{j})}\big{\|}\tfrac{m^{\prime}_{j}(\mathbf{x};\mathbf{t})}{s_{j}(\mathbf{x};\mathbf{t})}\big{\|}\leq M_{j}(\mathbf{x}),\qquad\sup_{\mathbf{t}\in U(\boldsymbol{\theta}_{j})}\big{\|}\tfrac{s^{\prime}_{j}(\mathbf{x};\mathbf{t})}{s_{j}(\mathbf{x};\mathbf{t})}\big{\|}\leq M_{j}(\mathbf{x}),

\sup_{\mathbf{t}\in U(\boldsymbol{\theta}_{j})}\big{\|}\tfrac{m^{\prime}_{j}(\mathbf{x};\mathbf{t})}{s_{j}(\mathbf{x};\mathbf{t})}\big{\|}\leq M_{j}(\mathbf{x}),\qquad\sup_{\mathbf{t}\in U(\boldsymbol{\theta}_{j})}\big{\|}\tfrac{s^{\prime}_{j}(\mathbf{x};\mathbf{t})}{s_{j}(\mathbf{x};\mathbf{t})}\big{\|}\leq M_{j}(\mathbf{x}),

\left|\varphi(u_{1},\dotsc,u_{d})\right|\leq\sum_{j=1}^{d}\frac{M_{1}}{\big{[}{\min\{u_{j},1-u_{j}\}}\big{]}^{\eta}}\,.

\left|\varphi(u_{1},\dotsc,u_{d})\right|\leq\sum_{j=1}^{d}\frac{M_{1}}{\big{[}{\min\{u_{j},1-u_{j}\}}\big{]}^{\eta}}\,.

\left|\varphi(u_{1},\dotsc,u_{d})\right|\leq\sum_{j=1}^{d}\frac{M_{2}}{\big{[}{\min\{u_{j},1-u_{j}\}}\big{]}^{\beta_{1}}}\,.

\left|\varphi(u_{1},\dotsc,u_{d})\right|\leq\sum_{j=1}^{d}\frac{M_{2}}{\big{[}{\min\{u_{j},1-u_{j}\}}\big{]}^{\beta_{1}}}\,.

φ^{(j)} (u_{1}, \dots, u_{d}) = \frac{\partial φ ( u _{1} , \dots , u _{d} )}{\partial u _{j}} .

φ^{(j)} (u_{1}, \dots, u_{d}) = \frac{\partial φ ( u _{1} , \dots , u _{d} )}{\partial u _{j}} .

\max_{k,\ell\in\{1,\dotsc,p\}}\,\sup_{\mathbf{a}\in\mathcal{U}}\big{|}\tfrac{\partial\psi_{k}(\mathbf{u};\mathbf{a})}{\partial a_{\ell}}\big{|}\leq h(\mathbf{u}).

\max_{k,\ell\in\{1,\dotsc,p\}}\,\sup_{\mathbf{a}\in\mathcal{U}}\big{|}\tfrac{\partial\psi_{k}(\mathbf{u};\mathbf{a})}{\partial a_{\ell}}\big{|}\leq h(\mathbf{u}).

\mathbf{U}=\big{(}U_{1},\dotsc,U_{d}\big{)}^{\mathsf{T}}=\big{(}F_{1\varepsilon}(\varepsilon_{1}),\dotsc,F_{d\varepsilon}(\varepsilon_{d}))^{\mathsf{T}},

\mathbf{U}=\big{(}U_{1},\dotsc,U_{d}\big{)}^{\mathsf{T}}=\big{(}F_{1\varepsilon}(\varepsilon_{1}),\dotsc,F_{d\varepsilon}(\varepsilon_{d}))^{\mathsf{T}},

|\psi(u_{1},u_{2};a)|\leq M_{3}\sum_{j=1}^{2}\big{|}\log(u_{j})+\log(1-u_{j})\big{|}

|\psi(u_{1},u_{2};a)|\leq M_{3}\sum_{j=1}^{2}\big{|}\log(u_{j})+\log(1-u_{j})\big{|}

|\psi^{(j)}(u_{1},u_{2};a)|\leq\frac{M_{3}}{\big{[}{\min\{u_{j},1-u_{j}\}}\big{]}}+M_{3}\sum_{j^{\prime}=1}^{2}\big{|}\log(u_{j^{\prime}})+\log(1-u_{j^{\prime}})\big{|},\quad j=1,2

|\psi^{(j)}(u_{1},u_{2};a)|\leq\frac{M_{3}}{\big{[}{\min\{u_{j},1-u_{j}\}}\big{]}}+M_{3}\sum_{j^{\prime}=1}^{2}\big{|}\log(u_{j^{\prime}})+\log(1-u_{j^{\prime}})\big{|},\quad j=1,2

\max_{k,\ell\in\{1,\dotsc,p\}}\,\sup_{\mathbf{a}\in\mathcal{U}}\sup_{\mathbf{u}\in(0,1)^{d}}\big{|}\tfrac{\partial\psi_{k}(\mathbf{u};\mathbf{a})}{\partial a_{\ell}}\big{|}<\infty\quad\text{and}\quad\max_{j\in\{1,\dotsc,d\}}\max_{k\in\{1,\dotsc,p\}}\,\sup_{\mathbf{u}\in(0,1)^{d}}\big{|}\tfrac{\partial\psi_{k}(\mathbf{u};\boldsymbol{\alpha})}{\partial u_{j}}\big{|}<\infty.

\max_{k,\ell\in\{1,\dotsc,p\}}\,\sup_{\mathbf{a}\in\mathcal{U}}\sup_{\mathbf{u}\in(0,1)^{d}}\big{|}\tfrac{\partial\psi_{k}(\mathbf{u};\mathbf{a})}{\partial a_{\ell}}\big{|}<\infty\quad\text{and}\quad\max_{j\in\{1,\dotsc,d\}}\max_{k\in\{1,\dotsc,p\}}\,\sup_{\mathbf{u}\in(0,1)^{d}}\big{|}\tfrac{\partial\psi_{k}(\mathbf{u};\boldsymbol{\alpha})}{\partial u_{j}}\big{|}<\infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Maximum pseudo-likelihood estimation based on estimated residuals in copula semiparametric models

Marek Omelka1, Šárka Hudecová1, Natalie Neumeyer2

Abstract.

This paper deals with a situation when one is interested in the dependence structure of a multidimensional response variable in the presence of a multivariate covariate. It is assumed that the covariate affects only the marginal distributions through regression models while the dependence structure, which is described by a copula, is unaffected. A parametric estimation of the copula function is considered with focus on the maximum pseudo-likelihood method. It is proved that under some appropriate regularity assumptions the estimator calculated from the residuals is asymptotically equivalent to the estimator based on the unobserved errors. In such case one can ignore the fact that the response is first adjusted for the effect of the covariate. A Monte Carlo simulation study explores (among others) situations where the regularity assumptions are not satisfied and the claimed result does not hold. It shows that in such situations the maximum pseudo-likelihood estimator may behave poorly and the moment estimation of the copula parameter is of interest. Our results complement the results available for nonparametric estimation of the copula function.

1 Department of Probability and Statistics, Faculty of Mathematics and Physics, Charles University, Sokolovská 83, 186 75 Praha 8, Czech Republic

2 Department of Mathematics, University of Hamburg, Bundesstrasse 55, 20146 Hamburg, Germany

Keywords and phrases: asymptotic normality, copula, moment estimation, pseudo-likelihood, residuals.

1. Introduction

Consider a $d$ -dimensional vector $\boldsymbol{Y}=(Y_{1},\ldots,Y_{d})^{\mathsf{T}}$ of responses and an associated $q$ -dimensional vector of the covariates $\boldsymbol{X}=(X_{1},\ldots,X_{q})^{\mathsf{T}}$ . For instance in insurance applications one can consider that the response represents various type of payments related to a given car accident (medical benefits, income replacement benefits, and allocated expenses for a claimant) and the covariates present some additional information (claimant’s age, gravity of accident, number of people injured in the accident, …).

Often we are interested in the conditional distribution of $\boldsymbol{Y}$ given the value of the covariate. To simplify the situation it is often assumed that $\boldsymbol{X}$ affects only the marginal distributions of $Y_{j}\ (j=1,\dotsc,d)$ , but does not affect the dependence structure of $\boldsymbol{Y}$ . More formally, it is assumed that there exists a copula $C$ such that the joint conditional distribution of $\boldsymbol{Y}$ given $\boldsymbol{X}=\mathbf{x}$ can be for all $\mathbf{x}\in S_{\boldsymbol{X}}$ (the support of $\boldsymbol{X}$ ) written as

[TABLE]

where $F_{j\mathbf{x}}(y_{j})=\mathsf{P}(Y_{j}\leq y_{j}\mid\boldsymbol{X}=\mathbf{x})$ , $j=1,\dotsc,d$ . Using this assumption one can proceed in two steps. In the first step one models the effect of the covariate on each of the marginal distributions separately (i.e. estimating $F_{j\mathbf{x}}$ for each $j\in\{1,\dotsc,d\}$ separately). Having $\widehat{F}_{j\mathbf{x}}$ one estimates the copula function $C$ in the second step.

Nonparametric estimation of the copula function $C$ (for $d=2$ and $q=1$ ) was in detail considered in Gijbels et al., (2015). The most interesting result is as follows. Suppose that the marginal distributions follow the parametric or even non-parametric location scale models, i.e.

[TABLE]

Note that then $C$ is the copula function corresponding to the random vector $(\varepsilon_{1},\varepsilon_{2})^{\mathsf{T}}$ . Then Gijbels et al., (2015) proved that (under some regularity assumptions) the empirical copula $\widehat{C}_{n}$ based on the estimated residuals from model (1) is asymptotically equivalent to the empirical copula $\widetilde{C}_{n}$ calculated from the unobserved errors $\varepsilon_{ji}$ . More precisely it was proved that

[TABLE]

This result was generalized to time-series setting by Neumeyer et al., (2019). In Portier and Segers, (2018) the authors were even able to drop the location-scale assumption (1) but at the cost of deriving only a slightly weaker result (the supremum in (2) is replaced with $\sup_{[\gamma,1-\gamma]^{2}}$ where $\gamma$ can be taken arbitrarily small but positive). On the other hand Côté et al., (2019) concentrated on the parametric form of the location scale model (1) and generalized the results to $d>2$ , $q>1$ and at the same time relaxed assumptions on $f_{j\varepsilon}$ (the density of $\varepsilon_{ji}$ ).

To complement the results on nonparametric estimation of $C$ one is naturally interested if analogous results hold also for parametric estimation of $C$ . More precisely suppose that the copula function $C$ belongs to the family $\mathcal{C}=\big{\{}C(\cdot;\mathbf{a}):\mathbf{a}\in\Theta\big{\}}$ and we are interested in estimating the unknown parameter. Denote $\boldsymbol{\alpha}$ the true value of the parameter, $\widehat{\boldsymbol{\alpha}}_{n}$ the estimator based on the residuals ( $\widehat{\varepsilon}_{ji}$ ) and $\widetilde{\boldsymbol{\alpha}}_{n}$ its counterpart based on the true (but unobserved) errors ( $\varepsilon_{ji}$ ) from the location-scale model (1). Then in analogy to (2) one would expect that $\widehat{\boldsymbol{\alpha}}_{n}$ is (the first-order) asymptotically equivalent to $\widetilde{\boldsymbol{\alpha}}_{n}$ , i.e.

[TABLE]

Although the conjecture (3) seems to be natural, to the best of our knowledge there are only limited results specifying the regularity assumptions that are needed so that (3) holds. Some results for the moment-like estimators that can be deduced from the convergence of the empirical copula $\widehat{C}_{n}$ can be found in Neumeyer et al., (2019) and Côté et al., (2019).

In this paper (similarly as in Côté et al.,, 2019) we assume the parametric form of the location-scale model (1) and concentrate on maximum pseudo-likelihood estimation. This method of estimation was in the context of copula models popularised by Genest et al., (1995) and in more detail investigated in Tsukahara, (2005). This method is often preferred to moment-like estimation because the resulting estimator has usually a lower asymptotic variance.

In the econometric (time-series) literature the inference based on the residuals is also known as univariate (marginal) filtering (see e.g., Bücher et al.,, 2015) and the result (3) is supported by many simulation studies. The result is formulated already in Chen and Fan, 2006a but there it is presented more on an intuitive level and the precise assumptions (as well as reasoning) are missing. This lack of of rigorousness were to some extent redeemed in the subsequent paper Chan et al., (2009) where the authors concentrated on the multivariate GARCH-models and presented a lot of interesting ideas how to deal with the technical difficulties. But a careful reading of the paper reveals that (probably due to the broad scope of the presented results) some of the crucial steps in the proofs are missing.

In our paper we will explore in detail the assumptions that are needed so that (3) holds in the standard i.i.d. setting. Even in this relatively simply setting one has to handle many technical difficulties. The thing is that it is not clear how to make use to of the recent deep results in empirical copula estimation (see e.g., Berghaus et al.,, 2017; Radulović et al.,, 2017) as the densities of many standard copulas are unbounded. The only remarkable exception in this aspect is Theorem 3.3 of (Berghaus et al.,, 2017), but the authors considered only two dimensional copulas and no covariates.

We show that although the assumptions that guarantess (3) are mild, they are not satisfied for some combinations of commonly used copula functions and marginal densities. Roughly speaking we illustrate that an unbounded copula density has to be compensated with marginal densities that are well behaved not only in the supports of the corresponding distributions, but also at the border points of the supports. We are convinced that exploring this problem in this settings is not only of independence interest, but it provides also insights to understand what might go wrong when switching to more complicated econometric or time-series models (see also the discussion in Section 4).

The paper is organised as follows. The main result and the needed assumptions are formulated in Section 2. The theoretical results are illustrated in a simulation study in Section 3. All the proofs are given in the Appendices.

2. Main result

In what follows we assume that for each $j\in\{1,\dotsc,d\}$ there exists a known transformation $T_{j}$ increasing on the support of $Y_{j}$ and known functions $m_{j}(\mathbf{x};\boldsymbol{\theta}_{j})$ and $s_{j}(\mathbf{x};\boldsymbol{\theta}_{j})$ depending only on an unknown (finite-dimensional) parameter $\boldsymbol{\theta}_{j}$ such that the random variable

[TABLE]

is independent of $\boldsymbol{X}$ with cumulative distribution function $F_{j\varepsilon}$ . The distribution of the random vector $\boldsymbol{\varepsilon}=(\varepsilon_{1},\dotsc,\varepsilon_{d})^{\mathsf{T}}$ has continuous margins and the copula corresponding to $\boldsymbol{\varepsilon}$ belongs to the families of copulas $\mathcal{C}=\big{\{}C(\cdot;\mathbf{a}):\mathbf{a}\in\Theta\big{\}}$ and $\Theta\subset\mathbb{R}^{p}$ .

Our task is to estimate the true value of the copula parameter (say $\boldsymbol{\alpha}$ ) based on the observations $\binom{\boldsymbol{Y}_{1}}{\boldsymbol{X}_{1}},\dotsc,\binom{\boldsymbol{Y}_{n}}{\boldsymbol{X}_{n}}$ that are assumed to be mutually independent copies of the vector $\binom{\boldsymbol{Y}}{\boldsymbol{X}}$ .

Let $\boldsymbol{Y}_{i}=(Y_{1i},\dotsc,Y_{di})^{\mathsf{T}}$ . As the parameters $\boldsymbol{\theta}_{j}$ ( $j\in\{1,\dotsc,d\}$ ) are in practice unknown, we work with the residuals

[TABLE]

where $\widehat{\boldsymbol{\theta}}_{j}$ is a suitable estimate of ${\boldsymbol{\theta}}_{j}$ . For $j\in\{1,\dotsc,d\}$ let $\widehat{F}_{j\widehat{\varepsilon}}$ be the marginal empirical distribution function of the estimated residuals, i.e.

[TABLE]

Then the maximum pseudo-likelihood estimator based on the residuals is defined as

[TABLE]

where

[TABLE]

are the estimated pseudo-observations and $c(\mathbf{u};\mathbf{a})$ is the density of the assumed copula family. As it is common in the maximum likelihood theory we will consider the estimator $\widehat{\boldsymbol{\alpha}}_{n}$ to be an appropriately chosen root of the estimating equations

[TABLE]

Analogously let $\widetilde{\boldsymbol{\alpha}}_{n}$ be the corresponding estimator based on the true (but unobserved) errors $\varepsilon_{ji}$ . I.e. $\widetilde{\boldsymbol{\alpha}}_{n}$ is defined as (an appropriately chosen) root of the estimating equations

[TABLE]

where

[TABLE]

and $\widehat{F}_{j\varepsilon}$ is the marginal empirical distribution function of the (unobserved) errors, i.e.

[TABLE]

2.1. Regularity assumptions on the marginal distributions

In general we need to assume that the density of the error term $\varepsilon_{j}$ should be ‘well-behaved’ on the border of its support. The following assumption is close to assumption F(iii) in Appendix A of Einmahl and Van Keilegom, (2008). But our assumption is weaker as it allows for distributions with supports different from a real line.

Assumption $(\mathbf{F}_{j\varepsilon})$ : For each $j\in\{1,\dotsc,d\}$ the density function $f_{j\varepsilon}$ of $\varepsilon_{j}$ is continuous on the support of $\varepsilon_{j}$ and there exists $\beta\in[0,\frac{1}{2})$ such that

[TABLE]

and

[TABLE]

Further for some $u_{1}$ , $u_{2}$ in $(0,1)$ the function $f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}$ is non-decreasing on $(0,u_{1})$ and non-increasing on $(u_{2},1)$ .

Note that assumption $(\mathbf{F}_{j\varepsilon})$ with $\beta=0$ allows also for distributions with non-continuous but bounded densities (e.g. exponential and uniform). But as we show later, for copula families with unbounded densities one needs to assume that $\beta>0$ .

*Remark 1**.*

The assumption $(\mathbf{F}_{j\varepsilon})$ is formulated so that it covers the general case when both the conditional mean as well as the conditional variance of $T_{j}(Y_{ji})$ depends on $\boldsymbol{X}_{i}$ . From the proofs given in the appendix it follows that if one rightly assumes that the conditional variance does not depend on $\boldsymbol{X}_{i}$ , then one does only location adjustment (i.e. $\widehat{\varepsilon}_{ji}=T_{j}(Y_{ji})-m_{j}(\boldsymbol{X}_{i};\widehat{\boldsymbol{\theta}}_{j})$ ) and assumption (8) simplifies to

[TABLE]

On the other hand if one rightly assumes that the conditional mean is zero then one does only scale adjustment (i.e. $\widehat{\varepsilon}_{ji}=\frac{T_{j}(Y_{ji})}{s_{j}(\boldsymbol{X}_{i};\widehat{\boldsymbol{\theta}}_{j})}$ ) and it is sufficient to assume

[TABLE]

This last assumption is close to the assumption 2. formulated just before Theorem 2.1 of Chan et al., (2009). But similarly as when comparing with assumption F(iii) in Appendix A of Einmahl and Van Keilegom, (2008), our assumption does not require that the support of the distribution is a real line.

*Remark 2**.*

As in assumption $(\mathbf{F}_{j\varepsilon})$ the function $f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}$ is supposed to be monotone when $u$ is close to zero or close to one, then the integrability of $f_{j\varepsilon}$ (see Lemma 12) implies that

[TABLE]

Thus if

[TABLE]

then one gets

[TABLE]

Note that the above equations are also automatically satisfied if $\beta>0$ even if (9) does not hold. Thus one can conclude that if (10) does not hold, then $\beta=0$ and the corresponding border of the support is finite, i.e.,

[TABLE]

2.2. Regularity assumptions on $m_{j}$ and $s_{j}$

The next assumption states that the parametric models can be estimated at the standard $\sqrt{n}$ -rate and that the location and scale functions are sufficiently smooth and integrable.

Assumption $\boldsymbol{(ms)}$ : For each $j\in\{1,\dotsc,d\}$ $\widehat{\boldsymbol{\theta}}_{j}$ is a $\sqrt{n}$ -consistent estimate of the parameter $\boldsymbol{\theta}_{j}\in\mathbb{R}^{p_{j}}$ . The functions $m_{j}(\mathbf{x};\mathbf{t})$ and $s_{j}(\mathbf{x};\mathbf{t})$ are (once) differentiable with respect to $\mathbf{t}$ and the derivatives are denoted as $m^{\prime}_{j}(\mathbf{x};\mathbf{t})$ and $s^{\prime}_{j}(\mathbf{x};\mathbf{t})$ . Further there exists a neighborhood $U(\boldsymbol{\theta}_{j})$ of the true value of the parameter $\boldsymbol{\theta}_{j}$ such that $\inf_{\mathbf{x}\in S_{\boldsymbol{X}},\mathbf{t}\in U(\boldsymbol{\theta}_{j})}s_{j}(\mathbf{x};\mathbf{t})>0$ and there exists a function $M_{j}:S_{\boldsymbol{X}}\to\mathbb{R}$ such that for each $\mathbf{x}\in S_{\boldsymbol{X}}$ :

[TABLE]

and $\mathsf{E}\big{[}M_{j}(\boldsymbol{X})\big{]}^{r}<\infty$ for some $r\geq 2$ . Finally, for each $K>0$ the derivatives $m^{\prime}_{j}(\mathbf{x};\mathbf{t})$ and $s^{\prime}_{j}(\mathbf{x};\mathbf{t})$ viewed as functions of $\mathbf{t}$ are continuous at $\boldsymbol{\theta}_{j}$ uniformly in $\mathbf{x}\in\{\tilde{\mathbf{x}}\in S_{\boldsymbol{X}}:\|\tilde{\mathbf{x}}\|\leq K\}$ .

2.3. Regularity assumptions about the copula family $\mathcal{C}$

To formulate the main regularity assumptions about the copula family it is useful to introduce the following set of functions.

Definition (Class of $\mathcal{J}$ - and

$\widetilde{\mathcal{J}}^{\beta_{1},\beta_{2}}$ -functions).

A function $\varphi:(0,1)^{d}\to\mathbb{R}$ is called a $\mathcal{J}$ -function if $\varphi$ is continuous on $(0,1)^{d}$ and there exist $\eta\in[0,1)$ and a finite constant $M_{1}$ such that for all $\mathbf{u}\in(0,1)^{d}$

[TABLE]

Let $\beta_{1}\in[0,1/2)$ and $\beta_{2}\geq 0$ be fixed. We say that a function $\varphi:(0,1)^{d}\to\mathbb{R}$ is a $\widetilde{\mathcal{J}}^{\beta_{1},\beta_{2}}$ -function if it is continuous on $(0,1)^{d}$ and there exists a finite constant $M_{2}$ such that for all $\mathbf{u}\in(0,1)^{d}$

[TABLE]

Further $\left|\varphi^{(j)}(u_{1},\dotsc,u_{d})\right|\,u_{j}^{\beta_{2}}(1-u_{j})^{\beta_{2}}$ is a $\mathcal{J}$ -function for all $j\in\{1,\dotsc,d\}$ , where

[TABLE]

Now we are ready to formulate the needed regularity assumptions about the copula family. Recall that $\Theta\subset\mathbb{R}^{p}$ , $\boldsymbol{\alpha}$ is the true value of the parameter, and $c(\mathbf{u};\mathbf{a})$ is a density corresponding to the copula function $C(\mathbf{u};\mathbf{a})$ .

Assumptions C:

C1.

$c(\mathbf{u};\mathbf{a}_{1})=c(\mathbf{u};\mathbf{a}_{2})$ for almost all $\mathbf{u}\in(0,1)^{d}$ only if $\mathbf{a}_{1}=\mathbf{a}_{2}$ .

C2.

The function $\log\{c(\mathbf{u};\mathbf{a})\}$ is continuously differentiable with respect to $\mathbf{a}$ for all $\mathbf{u}\in(0,1)^{d}$ .

Denote the $k$ th element of the vector function $\boldsymbol{\psi}(\mathbf{u};\mathbf{a})={\partial\log\{c(\mathbf{u};\mathbf{a})\}}/{\partial\mathbf{a}}$ by $\psi_{k}(\mathbf{u};\mathbf{a})$ .

C3.

For each $k\in\{1,\dotsc,p\}$ , the function $\psi_{k}(\cdot;\boldsymbol{\alpha})\in\widetilde{\mathcal{J}}^{\beta_{1},\beta_{2}}$ , where $\beta>\max\{\beta_{1}+\tfrac{1}{r-1},\beta_{2}\}$ , for $\beta$ introduced in assumption $(\mathbf{F}_{j\varepsilon})$ and $r$ in assumption $\boldsymbol{(ms)}$ .

C4.

The function $\boldsymbol{\psi}(\mathbf{u};\mathbf{a})$ is assumed to be continuously differentiable with respect to $\mathbf{a}$ for all $\mathbf{u}\in(0,1)^{d}$ . Further there exist an open neighborhood $\mathcal{U}\subset\Theta$ of $\boldsymbol{\alpha}$ and a dominating function $h(\mathbf{u})\in\mathcal{J}$ such that $\partial\boldsymbol{\psi}(\mathbf{u};\mathbf{a})/\partial\mathbf{a}^{\mathsf{T}}$ is continuous in $(0,1)^{d}\times\mathcal{U}$ and

[TABLE]

C5.

The $p\times p$ (Fisher information) matrix $I(\boldsymbol{\alpha})=-\mathsf{E}\,\big{\{}\partial\boldsymbol{\psi}(\mathbf{U};\mathbf{a})/{\partial\mathbf{a}^{\mathsf{T}}}\big{|}_{\mathbf{a}=\boldsymbol{\alpha}}\big{\}}$ , where

[TABLE]

is finite and nonsingular.

*Remark 3**.*

Note that the score functions of the commonly used one-parameter bivariate copula families with unbounded densities (e.g. Clayton, Gumbel, Normal, Student, …) can be bounded by

[TABLE]

and its derivative as

[TABLE]

for a sufficiently large but finite constant $M_{3}$ (see also Chen and Fan, 2006b, ). Thus in Assumption C3 one can consider $\beta_{1}$ and $\beta_{2}$ arbitrarily close to zero but positive.

Assumption C3 is inspired by Chan et al., (2009). Note that generally speaking this assumption is more strict than the corresponding assumptions of Tsukahara, (2005) that are based on $U$ -shaped functions. The advantage of assumption C3 is that it enables to derive bounds that depend only on the marginal distributions. The price that we pay for this advantage does not seem to be big because we are not aware of a standard copula family that does not meet C3 with $\beta_{1}$ and $\beta_{2}$ arbitrarily small positive constants.

Note that assumption C3 implies that $\beta>0$ , which does not allow for marginal densities $f_{j\varepsilon}$ that are bounded but possibly discontinuous at a border point (e.g. exponential or uniform distributions). As shown in simulations in Section 3 the aimed result (3) indeed does not hold in general when the marginal densities $f_{j\varepsilon}$ are not continuous.

Nevertheless a closer inspection of the proof shows that $\beta>0$ is needed to get a control over a possibly unbounded score function $\boldsymbol{\psi}(\mathbf{u};\mathbf{a})$ . But there are commonly used copula families (e.g. Frank, Ali-Mikhail-Haq, Plackett) for which the score function $\boldsymbol{\psi}(\mathbf{u};\mathbf{a})$ and its derivatives are bounded. It is of interest to formulate an alternative to assumptions C3 and C4 separately as it allows for $\beta=0$ in assumption $(\mathbf{F}_{j\varepsilon})$ ,

C6.

The function $\boldsymbol{\psi}(\mathbf{u};\mathbf{a})$ is bounded and continuously differentiable with respect to $\mathbf{a}$ for all $\mathbf{u}\in(0,1)^{d}$ . Further there exists an open neighborhood $\mathcal{U}$ of $\boldsymbol{\alpha}$ such that $\partial\boldsymbol{\psi}(\mathbf{u};\mathbf{a})/\partial\mathbf{a}^{\mathsf{T}}$ is continuous in $(0,1)^{d}\times\mathcal{U}$ and

[TABLE]

2.4. Main results

Now we are ready to formulate the main results of the paper.

Theorem 1.

Suppose that assumptions $\boldsymbol{(ms)}$ , C1-C5 and $(\mathbf{F}_{j\varepsilon})$ with $\beta>0$ are satisfied. Then with probability going to one there exist consistent roots (say $\widehat{\boldsymbol{\alpha}}_{n}$ and $\widetilde{\boldsymbol{\alpha}}_{n}$ ) of the estimating equations (5) and (6). Further $\widehat{\boldsymbol{\alpha}}_{n}$ and $\widetilde{\boldsymbol{\alpha}}_{n}$ satisfy (3).

The next theorem say that if assumption C6 is satisfied then one can also include the case $\beta=0$ in assumption $(\mathbf{F}_{j\varepsilon})$ . Thus for instance if one (rightly) assumes that $C$ is a Frank copula then the marginal distributions of the errors are allowed to be also uniform or exponential.

Theorem 2.

Suppose that assumptions $\boldsymbol{(ms)}$ , C1, C2, C5, C6 and $(\mathbf{F}_{j\varepsilon})$ are satisfied. Then the statement of Theorem 1 holds.

The above theorems imply that when fitting the copula $C$ one can (under the stated assumptions) ignore the fact that he/she is working with estimated residuals ( $\widehat{\varepsilon}_{ij}$ ) instead of unobserved errors ( $\varepsilon_{ij}$ ). As it is known (and it also follows from the proof of Theorem 1) the asymptotic distribution of $\widetilde{\boldsymbol{\alpha}}_{n}$ is normal. Thus thanks to (3) one can conclude that also $\widehat{\boldsymbol{\alpha}}_{n}$ is asymptotically normal.

Corollary 1.

Suppose that the assumptions either of Theorem 1 or 2 hold. Then with probability going to one there exists a consistent root $\widehat{\boldsymbol{\alpha}}_{n}$ of (5). This root satisfies

[TABLE]

where $\widetilde{\boldsymbol{\psi}}\big{(}\mathbf{u})=\big{(}\widetilde{\psi}_{1}(\mathbf{u}),\dotsc,\widetilde{\psi}_{p}(\mathbf{u})\big{)}^{\mathsf{T}}$ with

[TABLE]

3. Simulation study

A Monte Carlo study was conducted in order to illustrate the theoretical conclusions and to show how the finite sample performance of the maximum pseudo-likelihood estimator depends on the level of violation of the regularity assumptions.

3.1. Settings

To keep the presentation as clear as possible we concentrate on a bivariate response variable (some results for a three-dimensional case can be found in the Supplementary material) following the model

[TABLE]

The joint cumulative distribution function $H(y_{1},y_{2})$ of the random vector $(\varepsilon_{1i},\varepsilon_{2i})^{\top}$ is $C\big{(}F_{1\varepsilon}(y_{1}),F_{2\varepsilon}(y_{2})\big{)}$ , where $C$ is a copula and $F_{1\varepsilon}$ , $F_{2\varepsilon}$ are marginal distribution functions. The following five copula families were considered for $C$ : Clayton, Frank, Gumbel, Gaussian, and Student with 5 degrees of freedom. The copula parameter $\alpha$ is chosen such that the corresponding Kendall’s tau is $\tau=0.5$ or $\tau=0.75$ . The marginal distributions were chosen one of the following:

$-$

$F_{1\varepsilon}$ is standard normal and $F_{2\varepsilon}$ exponential with mean 1 (denoted as N+E),

$-$

$F_{1\varepsilon}$ is standard normal and $F_{2\varepsilon}$ uniform on $[-1,1]$ (denoted as N+U),

$-$

$F_{1\varepsilon}$ and $F_{2\varepsilon}$ are both Student $t$ with 5 degrees of freedeom (denoted as t).

The first two situations satisfy the assumption $(\mathbf{F}_{j\varepsilon})$ only with $\beta=0$ . Hence, the result of Theorem 2 applies only if (C6) holds. From the five considered copula families, this is the case only for the Frank copula. On the other hand, the $t$ marginals satisfy $(\mathbf{F}_{j\varepsilon})$ with $\beta>0$ and the assumptions of Theorem 1 hold. Hence, these marginals provide a useful regular benchmark for a comparison with the first two situations.

The covariate $X_{i}$ is generated from the standard normal distribution (Poisson distribution with mean 5 was considered as well, but the results are almost identical and are not reported). The presented results correspond to the particular choice $\theta_{10}=1$ , $\theta_{20}=-1$ , $\theta_{11}=1$ , and $\theta_{21}=2$ . The unobserved errors $\varepsilon_{ji}$ are estimated as the residuals after fitting the regression lines (marginally) where the parameters are estimated with the help of the least squares method assuming $s_{j}\equiv 1$ , $j=1,2$ , cf. Remark 1.

The following estimators of the parameter $\alpha$ are compared:

(i)

(oracle) inversion of Kendall’s tau based on the unobserved errors $\widetilde{\alpha}^{(ik)}$ ; 2. (ii)

inversion of Kendall’s tau based on the residuals $\widehat{\alpha}^{(ik)}$ ; 3. (iii)

(oracle) maximum pseudo-likelihood estimator based on the unobserved errors $\widetilde{\alpha}^{(pl)}$ ; 4. (iv)

maximum pseudo-likelihood method estimator on the residuals $\widehat{\alpha}^{(pl)}$ ; 5. (v)

modified maximum pseudo-likelihood estimator based on the residuals $\widehat{\alpha}^{(pl*)}$ .

The latter estimator $\widehat{\alpha}^{(pl*)}$ is inspired by the estimator introduced in the context of single index conditional copulas by Fermanian and Lopez, (2018). In our situation this estimator coincides with the maximum pseudo-likelihood estimator computed only from $\widehat{\mathbf{U}}_{i}$ which lie in $[\delta_{n},1-\delta_{n}]^{2}$ , where $\delta_{n}=Dn^{-1/\lambda}$ . Note that this choice corresponds to the choice $\delta_{n}$ in the proof of Theorem 1. In the presented simulations we choose $D=1/4$ and $\lambda=1.9$ , thus in view of Remark 3 the statement of Theorem 1 (or 2) holds also for $\widehat{\alpha}^{(pl*)}$ provided that the corresponding regularity assumptions hold.

In order to have more comparable results for the various copula families, the estimates of the parameters are presented on the Kendall’s tau scale. The performance of the estimators is measured by the bias, the standard error (SD), and the root mean square error (RMSE), which are estimated from $1\,000$ random samples of sample sizes $n=100,\ 1\,000,\ 10\,000$ and whose 100 multiplies are reported, because the obtained quantities are typically of order $10^{-2}$ . The obtained results for Clayton, Frank and Gaussian copulas are listed in Tables 1, 2, and 3, while tables for Gumbel and Student copula can be found in the Supplementary material. The Monte Carlo simulations were run in R statistical computing environment (R Core Team,, 2018). The same starting seed was always used so that the estimates based on the true (but unobserved) errors $\varepsilon_{ij}$ are the same regardless the choice of the marginals $F_{1\varepsilon}$ and $F_{2\varepsilon}$ . These ‘oracle’ estimates are denoted as “inov” in the tables and provide benchmarks for the estimators calculated from the estimated residuals.

3.2. Findings

As it is well known (Genest et al.,, 1995; Tsukahara,, 2005) in case of no covariates the maximum pseudo-likelihood is usually more efficient than the moment like estimators. This is illustrated by the performance of the estimators $\widetilde{\alpha}^{(ik)}$ and $\widetilde{\alpha}^{(pl)}$ that are calculated from the errors $\varepsilon_{ij}$ . The question of interest is if this property continues to hold also for estimators that are calculated from the residuals (i.e., in the presence of covariates).

Generally speaking one can conclude that in agreement with our theoretical results the maximum pseudo-likelihood estimator $\widehat{\alpha}^{(pl)}$ outperforms $\widehat{\alpha}^{(ik)}$ in situations for which our regularity assumptions are satisfied (see Table 2 and the rows corresponding to $t$ -marginals in Tables 1 and 3). For these situations the modified maximum pseudo-likelihood estimator $\widehat{\alpha}^{(pl*)}$ is of no interest.

On the other hand the performance of $\widehat{\alpha}^{(pl)}$ may deteriorate significantly if the regularity assumptions are not met. The problems are generally worse for larger values of Kendall’s tau (a stronger dependence). It is also interesting that exponential margins (rows denoted as N+E) are much more problematic than uniform margins (rows denoted as N+U).

As illustrated in Table 1 one should be in particular careful when fitting the Clayton copula (and also the Gumbel copula as illustrated in the Supplementary material). Then $\widehat{\alpha}^{(pl)}$ performs significantly worse than $\widehat{\alpha}^{(ik)}$ in cases of non-regular margins combined with a strong dependence ( $\tau=0.75$ ). The problems can be to some extent prevented by considering the modified estimator $\widehat{\alpha}^{(pl*)}$ in particular in case of uniform margins (N+U). Thus while for Frank copula the modified estimator $\widehat{\alpha}^{(pl*)}$ is of no interest, for the Clayton (and the Gumbel) copula it presents an interesting alternative to the ‘standard’ pseudo maximum-likelihood estimator.

The results for the Gaussian copula (see Table 3) are of independence interest. Note that although the density of the copula function is unbounded, the estimator $\widehat{\alpha}^{(pl)}$ performs better than $\widehat{\alpha}^{(ik)}$ for $\tau=0.5$ even in case of exponential margins (N+E). And this holds true for uniform margins (N+U) even for $\tau=0.75$ . This raises a question whether a milder assumptions than $(\mathbf{F}_{j\varepsilon})$ would be sufficient for the Gaussian copula.

An analogous simulation study was conducted also for a system of three linear regressions, where the vector of innovations was sampled from $C\big{(}F_{1\varepsilon}(y_{1}),F_{2\varepsilon}(y_{2}),F_{3\varepsilon}(y_{3})\big{)}$ with the marginals $F_{1\varepsilon}$ and $F_{2\varepsilon}$ being standard normal and $F_{3\varepsilon}$ either exponential (with mean 1) or uniform on $[-1,1]$ . As the obtained results are very similar to the results for model (12), they are not presented here, but can be found in the Supplementary material. The common important finding is that the pseudo-likelihood estimator $\widehat{\alpha}^{(pl)}$ may perform poorly (and noticeably worse compared to $\widehat{\alpha}^{(ik)}$ ) for copula families with unbounded densities even in cases when only one of the marginals does not satisfy the regularity assumption while the remaining ones are regular.

4. Conclusions and further discussions

As illustrated in the previous section one should be careful when a copula with an unbounded density is fitted with the help of the maximum pseudo-likelihood method. Although the assumptions of Theorem 1 are not strict one should keep in mind that they are not satisfied for distributions with a non-continuous error density function $f_{j\varepsilon}$ (e.g., uniform distribution, exponential distribution, …). Although such situations are probably rare in practice, there are applications in which for instance uniform errors can naturally appear (see e.g., Schechtman and Schechtman,, 1986).

One of the possible next steps would be to generalize the results into the time-series context and to find the assumptions so that the results claimed in Chen and Fan, 2006a hold. Based on our results for i.i.d. setting and our simulation study we conjecture that the method of the pseudo-likelihood estimation can be problematic when the marginal models have exponential innovations (or more generally positive or bounded innovations with discontinuous density) (see e.g. Lawrance and Lewis,, 1985; Davis and McCormick,, 1989; Anděl,, 1989, 1992; Nielsen and Shephard,, 2003) and one uses $\sqrt{n}$ -consistent estimators of the model parameters.

Note that in models where (based on our findings) the use of maximum pseudo-likelihood estimation is questionable, one can consider the method of moments (see e.g., Section 5.5.1 of McNeil et al.,, 2005; Brahimi and Necir,, 2012). As proved in Côté et al., (2019) many moment estimators based on residuals satisfy (3) under less restrictive assumptions on the marginal error density $f_{j\varepsilon}$ . In particular for standard two-dimensional copulas the method of the inversion of Kendall’s tau can present a ‘robust’ alternative. It is usually only slightly less efficient if no covariates are present, but in the presence of covariates it can perform significantly better than the maximum pseudo-likelihood estimator.

For the sake of brevity we concentrated only on estimation of the copula parameter. We conjecture that also other procedures (e.g., procedures for goodness-of-fit testing) that make use of the maximum pseudo-likelihood estimator $\widehat{\boldsymbol{\alpha}}_{n}$ calculated from the residuals will be valid provided that next to our assumptions also some standard regularity assumptions for these procedures are satisfied.

Acknowledgments

M. Omelka gratefully acknowledges support from the grant GACR 19-00015S. The research of Š. Hudecová was supported by the grant GACR 18-01781Y. N. Neumeyer gratefully acknowledges support from the DFG (Research Unit FOR 1735 Structural Inference in Statistics: Adaptation and Efficiency).

Appendix A Proofs of the main results

Note that the estimated pseudoobservations $\widehat{\mathbf{U}}_{i}$ given by (4) can be viewed as estimates of ‘unobserved’ pseudoobservations $\widetilde{\mathbf{U}}_{i}$ (given in (7)) which can be further viewed as estimates of $\mathbf{U}_{i}$ , given by

[TABLE]

To prove Theorem 1 we need some technical results about the ‘closeness’ of $\widehat{U}_{ji}$ (the $j$ -th element of $\widehat{\mathbf{U}}_{i}$ ) to $\widetilde{U}_{ji}$ and $U_{ji}$ .

As we will show later one does not need to handle $\widehat{U}_{ji}$ if either $U_{ji}$ is close to zero or one or if $M_{j}(\boldsymbol{X}_{i})$ is too large. This is formalised as follows. Introduce the set of indices

[TABLE]

where

[TABLE]

The following lemma gives an upper bound on the number of indices $i$ for which it holds that $U_{ji}\not\in[\delta_{n},1-\delta_{n}]$ or $M_{j}(\boldsymbol{X}_{i})>a_{n}$ .

Lemma 1.

Let $\delta_{n}$ and $a_{n}$ satisfy (A2) and assumption $\boldsymbol{(ms)}$ holds. Then

[TABLE]

which further implies that

[TABLE]

Proof.

Denote

[TABLE]

and note that thanks to (A2) and Markov’s inequality (applied to $M_{j}^{r}(\boldsymbol{X}_{i})$ )

[TABLE]

Now as the random variable $\frac{1}{n}\sum_{i=1}^{n}\mathbf{1}\big{\{}U_{ji}\not\in[\delta_{n},1-\delta_{n}]\text{ or }M_{j}(\boldsymbol{X}_{i})>a_{n}\Big{\}}$ is non-negative one can use once more Markov’s inequality to conclude that

[TABLE]

∎

A.1. Some results on statistics with ranks calculated

from residuals

Lemma 2.

Suppose that assumptions $(\mathbf{F}_{j\varepsilon})$ and $\boldsymbol{(ms)}$ hold and that $\varphi$ is a $\mathcal{J}$ -function. Then

[TABLE]

Proof.

As $\varphi$ is a $\mathcal{J}$ -function, it is easy to show that the expectation $\mathsf{E}\,\varphi(\mathbf{U})$ exists and is finite. Thus thanks to the law of large numbers it is sufficient to show

[TABLE]

Let $\mathrm{J}_{jn}^{X}$ and $\delta_{n}$ be as in (A1) and (A2), where $\lambda$ and $\lambda_{x}$ are chosen so that they satisfy the assumptions of Lemma 6. Then this lemma together with the standard Glivenko-Cantelli theorem for the empirical distribution function $\widehat{F}_{j\varepsilon}$ implies that

[TABLE]

Now introduce

[TABLE]

and note that with the help of (A4)

[TABLE]

As the above equation is not guaranteed for $i\in\mathrm{K}_{n}^{X}$ , we need to take care about the sets of indices $\mathrm{J}_{n}^{X}$ and $\mathrm{K}_{n}^{X}$ separately. That is why we bound $D_{n}$ given by (A3) as

[TABLE]

In what follows we show that each term on the right-hand side of (A7) is asymptotically negligible.

Dealing with the first term in (A7)

As $\varphi$ is a $\mathcal{J}$ -function one can bound

[TABLE]

Now by Lemma 1 (with probability going to one) there are at most $dn^{1-1/\lambda}\log n$ indices $i$ for which there exists $j\in\{1,\dotsc,d\}$ such that $U_{ji}\not\in[\delta_{n},1-\delta_{n}]$ or $M_{j}(\boldsymbol{X}_{i})>a_{n}$ . Thus one can choose the indices $i$ for which $\frac{1}{[\min\{\widehat{U}_{ji},1-\widehat{U}_{ji}\}]^{\eta}}$ takes the biggest values and gets that (with probability going to one)

[TABLE]

Dealing with the second term in (A7)

Note that $\mathsf{E}\big{|}\varphi(\mathbf{U}_{i})\big{|}<\infty$ implies that

[TABLE]

Thus $\frac{1}{n}\sum_{i\in\mathrm{K}_{n}^{X}}\big{|}\varphi(\mathbf{U}_{i})\big{|}=o_{P}(1)$ follows from Markov’s inequality.

Dealing with the third term in (A7)

We use the continuity of the function $\varphi$ . To be able to do that we need to stay in the interior of $[0,1]^{d}$ . Thus for a given $\delta\in(0,1/2)$ (that will be specified later on), consider the set

[TABLE]

and introduce the corresponding sets of indices

[TABLE]

where for simplicity of notation we do not stress that both $\mathrm{J}_{\delta}$ and $\mathrm{K}_{\delta}$ depends on $n$ . Now one can bound

[TABLE]

Note that by the uniform continuity of the function $\varphi(\cdot)$ on $[\delta/2,1-\delta/2]^{d}$ and (A6) one gets that the first term on the right-hand side of (A11) converges to zero in probability.

To deal with the second term on the right-hand side of (A11) note that thanks to (A6) with probability going to one

[TABLE]

Thus one can bound

[TABLE]

which can be made arbitrarily small by taking $\delta$ small enough.

Finally with the help of law of large numbers the third term on the right-hand side of (A11) can be bounded by

[TABLE]

which can be also made arbitrarily small by taking $\delta$ sufficiently small and $n$ sufficiently large. ∎

Lemma 3.

*Suppose that assumptions $(\mathbf{F}_{j\varepsilon})$ and $\boldsymbol{(ms)}$ hold. Let $\varphi$ be a $\widetilde{\mathcal{J}}^{\beta_{1},\beta_{2}}$ -function such that $\mathsf{E}\,\{\varphi(\mathbf{U})\}=0$ and $\beta>\max\{\beta_{1}+\frac{1}{r-1},\beta_{2}\}$ . Then *

[TABLE]

Proof.

Let $\mathrm{J}_{n}^{X}$ and $\mathrm{K}_{n}^{X}$ be defined as in (A5). Then similarly as in (A8) of the proof of Lemma 2 one can bound

[TABLE]

where the role of $\eta$ is now taken by $\beta_{1}$ .

In what follows we take $\lambda$ so that

[TABLE]

and $\lambda_{x}$ satisfies (B30). Such choices of $\lambda$ and $\lambda_{x}$ guarantee that the right-hand sides of (A13) are of order $o_{P}(1)$ and at the same time the assumptions of Lemma 5 are satisfied and one can make use of Lemmas 6 and 7.

It is sufficient to show that

[TABLE]

Note that

[TABLE]

where $\delta_{n}$ and $a_{n}$ are given in (A2).

Now by the mean value theorem

[TABLE]

where $U_{ji}^{*}$ lies between $\widehat{U}_{ji}$ and $\widetilde{U}_{ji}$ . Thus to prove the lemma it is sufficient to show that the second term on the right-hand side of (A14) diminishes in probability.

With the help of Lemma 6 for a fixed $j\in\{1,\dotsc,d\}$ one gets

[TABLE]

where

[TABLE]

and $\gamma>0$ is taken sufficiently small so that $\beta-\gamma>\beta_{2}$ . In what follows we show that $C_{n}$ and $A_{n}+B_{n}$ are asymptotically negligible.

Dealing with $C_{n}$ . With the help of Lemma A3 of Shorack, (1972) and Lemma 7 for each $\varepsilon>0$ there exists a positive constant $L$ such that the quantity $C_{n}$ given by (A17) can be with probability at least $1-\varepsilon$ bounded by

[TABLE]

where the law of large numbers is used on the last line.

Thus one can concentrate on the quantities $A_{n}$ and $B_{n}$ .

Dealing with $A_{n}$ . Note that $A_{n}$ given by (A15) can be rewritten as

[TABLE]

Now analogously as in the proof of Lemma 2 one can show that

[TABLE]

and also

[TABLE]

Combining (A18), (A19), (A20) and the fact that the estimator $\boldsymbol{\widehat{\theta}}_{j}$ is $\sqrt{n}$ -consistent yields

[TABLE]

Dealing with $B_{n}$ . Now have a look at the term $B_{n}$ defined in (A16). One can proceed analogously as above and show that

[TABLE]

where

[TABLE]

Now similarly as in the proof of Lemma 5 one can show that

[TABLE]

and analogously also

[TABLE]

Now (A21), (A22), (A23) and (A24) yields that $B_{n}=-A_{n}+o_{P}(1)$ , which was to be proved.

∎

The following lemma will be useful for copula families with ‘nicely bounded’ score functions.

Lemma 4.

Suppose that assumptions $(\mathbf{F}_{j\varepsilon})$ and $\boldsymbol{(ms)}$ hold. Let $\varphi$ be a $\widetilde{\mathcal{J}}^{0,0}$ -function such that $\mathsf{E}\,\{\varphi(\mathbf{U})\}=0$ and $\varphi^{(j)}$ is bounded for each $j\in\{1,\dotsc,p\}$ . Then the statement of Lemma 3 holds.

Proof.

By the mean value theorem

[TABLE]

Now take $\lambda>2(1+\frac{1}{r})$ and recall the sets of indices $\mathrm{J}_{n}^{X}$ of $\mathrm{K}_{n}^{X}$ introduced in (A5). Then

[TABLE]

Now with the help of Lemma 9 one can show that the second term on the right-hand side of (A25) can be bounded as the preceding equation is $o_{P}(1)$

[TABLE]

where the last equation follows from Markov’s inequality and

[TABLE]

Finally the first term on the right-hand side of (A25) can be handled analogously as in the proof of Lemma 3. ∎

Corollary 2.

Suppose that assumptions of Lemma 3 or Lemma 4 are satisfied. Then

[TABLE]

where

[TABLE]

Proof.

With the help of (A12) it is sufficient to show that

[TABLE]

But this can be proved component-wise by mimicking the proof of Lemma 2 of Gijbels et al., (2017), where the situation with $d=2$ but a more general $\varphi$ depending possibly also on $\boldsymbol{X}_{i}$ is considered. ∎

A.2. Proofs of Theorems 1 and 2

Proof of Theorem 1.

With the help of Lemmas 2 and 3 the proof can closely follow the proof of Lemma 3 in Gijbels et al., (2017). In order to do that define

[TABLE]

In what follows we show that assumptions of Theorem A.10.2 of Bickel et al., (1993) are satisfied for $\boldsymbol{W}_{n}$ and $\boldsymbol{W}$ given by (A26).

It follows from the standard maximum likelihood theory that Assumption (GM0) is satisfied thanks to Assumption C1. Moreover, Assumptions C4 and C5 imply Assumption (GM3). Assumption (GM2) is also satisfied as thanks to assumption C3 one can for each $k\in\{1,\dotsc,p\}$ apply Corollary 2 to $\varphi(\mathbf{u})=\psi_{k}(\mathbf{u};\boldsymbol{\alpha})$ and get

[TABLE]

where $\widetilde{\boldsymbol{\psi}}(\mathbf{u})$ was introduced in Corollary 1.

Thus, it remains to check Assumption (U) from Theorem A.10.2. Therefore for each $\varepsilon>0$ and for each $k,\ell\in\{1,\dotsc,p\}$ , it is sufficient to find a neighborhood $\mathcal{U}_{\varepsilon}=\{\mathbf{a}\in\mathcal{U}:\,\|\mathbf{a}-\boldsymbol{\alpha}\|<\varepsilon\}$ such that

[TABLE]

where $I^{(j,\ell)}(\mathbf{a})$ stands for the $(j,\ell)$ element of $I(\mathbf{a})$ .

For simplicity of notation, let us put $g_{k,\ell}(\mathbf{u};\mathbf{a})=\partial\psi_{k}(\mathbf{u};\mathbf{a})/\partial a_{\ell}$ . Assumption C4 allows to adapt Lemma 2, which gives

[TABLE]

Hence, it remains to show

[TABLE]

For a given $\delta\in(0,1/4)$ (that will be specified later on), let us introduce the sets $\mathbf{I}_{\delta}$ and $\mathrm{J}_{\delta}$ as in (A9) and (A10). Then the left-hand side of (A27) can be bounded by

[TABLE]

where $\mathrm{J}_{n}^{X}$ was introduce in (A5) and $h$ in Assumption C4. Now with probability going to one for each sufficiently large $n$ , if $\mathbf{U}_{i}\in\mathbf{I}_{\delta}$ , then $\widehat{\mathbf{U}}_{i}\in\mathbf{I}_{\delta/2}$ . Thus for each $\delta\in(0,1/4)$ the term on the right-hand side of (A28) can be made arbitrarily small (Assumption C4) up to $o_{P}(1)$ term by considering a sufficiently small neighbourhood $\mathcal{U}_{\varepsilon}$ .

Finally, analogously as in the proof of Lemma 2, one can show that

[TABLE]

where $r(\delta)\to 0$ as $\delta\to 0_{+}$ .

Thus we have verified the assumptions of Theorem A.10.2 of Bickel et al., (1993) which yields that there exists a consistent root (say $\widehat{\boldsymbol{\alpha}}_{n}$ ) of the estimating equation (5) which has the following asymptotic representation

[TABLE]

where the elements of the vector function $\widetilde{\boldsymbol{\psi}}$ are given in (11). Note that completely analogously one can show that there exists a consistent root (say $\widetilde{\boldsymbol{\alpha}}_{n}$ ) of the estimating equation (6) which has the same asymptotic representation. This finally implies the statement of the theorem. ∎

Proof of Theorem 2.

The proof is completely analogous to the proof of Theorem 2. The only difference is that one uses Lemma 4 instead of Lemma 3. In fact the proof is even simpler as thanks to assumption C6 one can take a finite constant instead of the function $h$ . ∎

Appendix B Some results on $\widehat{F}_{j\widehat{\varepsilon}}$ and $\widehat{U}_{ji}$

In what follows let $x_{+}=\max\{x,0\}$ .

Lemma 5.

Suppose that assumptions $(\mathbf{F}_{j\varepsilon})$ and $\boldsymbol{(ms)}$ hold. Then for $\delta_{n}=n^{-1/\lambda}$ where $\lambda>2(1-\beta+\tfrac{1}{r-1})$ it holds uniformly in $u\in[\delta_{n}/2,1-\delta_{n}/2]$

[TABLE]

for each $\gamma>0$ and $j\in\{1,\dotsc,d\}$ .

Proof.

We will show the statement for $u\in[\frac{\delta_{n}}{2},\tfrac{1}{2}]$ . The proof would be completely analogous for $u\in[\tfrac{1}{2},1-\frac{\delta_{n}}{2}]$ .

Note that

[TABLE]

In what follows we need to take care of the fact that the majorant $M_{j}(\mathbf{x})$ from assumption $\boldsymbol{(ms)}$ can be unbounded. Let $a_{n}=n^{1/(\lambda_{x}r)}$ , where $\lambda_{x}$ will be specified later. Then similarly as in the proof of Lemma 1 one can use Markov’s inequality to bound

[TABLE]

Note that thanks to the assumption $\lambda>2(1-\beta+\tfrac{1}{r-1})$ it is straightforward to verify that $\tfrac{1}{2}+\tfrac{\beta}{\lambda}<r\big{(}\tfrac{1}{2}-\tfrac{1-\beta}{\lambda}\big{)}.$ In the following we will take $\lambda_{x}$ such that

[TABLE]

Now with the help of (B2) one can conclude that

[TABLE]

for $u\in[\delta_{n}/2,1/2]$ .

Now for simplicity of notation introduce

[TABLE]

Further for $u\in(0,1]$ and $\mathbf{t}\in\mathbb{R}^{p_{j}}$ put

[TABLE]

where $\eta>0$ is sufficiently small. Note that the function $w$ is increasing on $(0,\frac{1}{2})$ and decreasing on $(\frac{1}{2},1)$ for $\beta-\gamma>0$ . Finally let

[TABLE]

and for $i\in\{1,\dotsc,n\}$ introduce the processes

[TABLE]

that are indexed by the set $\mathcal{F}=T_{1}\times(0,1/2]$ , where $T_{1}=\big{\{}\mathbf{t}\in\mathbb{R}^{p_{j}}:\|\mathbf{t}\|\leq 1\big{\}}$ .

Note that assumption $\boldsymbol{(ms)}$ guarantees that $n^{1/2-\eta}(\widehat{\boldsymbol{\theta}}_{j}-\boldsymbol{\theta}_{j})\xrightarrow[n\rightarrow\infty]{\mathrm{P}}0$ for each $\eta\in(0,\frac{1}{2})$ , which further implies that $\mathsf{P}(\|n^{1/2-\eta}(\widehat{\boldsymbol{\theta}}_{j}-\boldsymbol{\theta}_{j})\|\leq 1)\xrightarrow[n\rightarrow\infty]{}1$ . Put

[TABLE]

Then with the help of (B3) one can (with probability going to one) write that for $u\in[\delta_{n}/2,1/2]$

[TABLE]

Now equip the space $\mathcal{F}$ with the semimetric $\rho$ given by

[TABLE]

where $K$ is a finite constant that will be specified afterwards.

Later we show that the assumptions of Theorem 2.11.11 of van der Vaart and Wellner, (1996) are satisfied for the empirical process indexed by $\mathcal{F}$ , which implies that the process is asymptotically tight. Further as $\sup_{u\in(0,\frac{1}{2}]}\rho\big{(}(\widehat{\boldsymbol{\vartheta}}_{n},u),(\boldsymbol{0},u)\big{)}=o_{P}(1)$ , one gets that uniformly in $u\in(0,1/2]$

[TABLE]

where $\mathsf{E}_{U,\boldsymbol{X}}$ stands for the expectation with respect to $U_{ji}$ ’s and $\boldsymbol{X}_{i}$ ’s (while considering $\widehat{\boldsymbol{\vartheta}}_{n}$ being fixed).

In what follows we concentrate on $u\in[\delta_{n}/2,1/2]$ . If not stated otherwise all the following results hold uniformly for $u$ from this interval.

Note that similarly as in (B5) one can argue that

[TABLE]

This together with (B3) and (B) implies

[TABLE]

Thus to finish the proof it remains to deal with the second term on the right-hand side of (B8). As $\sqrt{n}\,(\boldsymbol{\widehat{\theta}}_{j}-\boldsymbol{\theta}_{j})=O_{P}(1)$ one can use the mean value theorem which guarantees that (with probability going to one) there exists $\mathbf{t}_{*}\in T_{1}$ such that

[TABLE]

Note that for $\mathbf{x}$ such that $M_{j}(\mathbf{x})\leq a_{n}$ one has

[TABLE]

and also

[TABLE]

where both inequalities hold uniformly in $\mathbf{t}\in T_{1}$ and $\mathbf{x}\in\{\tilde{\mathbf{x}}:M_{j}(\tilde{\mathbf{x}})\leq a_{n}\}$ . Thus with the help of Lemma 11

[TABLE]

and also

[TABLE]

Now combining the above findings with assumption $\boldsymbol{(ms)}$ yields that (B9) can be simplified to

[TABLE]

which together with (B8) implies (B1).

Verifying assumptions of Theorem 2.11.11 of van der Vaart and Wellner, (1996)

First of all we need to show that the semimetric $\rho$ defined in (B6) is Gaussian-dominated. To prove that it is sufficient to show that (see p. 212 of van der Vaart and Wellner,, 1996)

[TABLE]

where $N(\epsilon,\mathcal{F},\rho)$ is the covering number of $\mathcal{F}$ .

It is known (see Example 2.11.15 of van der Vaart and Wellner,, 1996) that (B14) holds true if $\mathcal{F}$ is replaced with $(0,1/2]$ and $\rho$ with

[TABLE]

as $\rho_{0}$ is Gaussian. But from the definition of $\rho$ in (B6) it follows that one can bound

[TABLE]

thus also $(\mathcal{F},\rho)$ satisfies (B14).

Next we need to check the three assumptions of Theorem 2.11.11 of van der Vaart and Wellner, (1996). As in our situations the processes $Z_{n1},\dotsc,Z_{nn}$ are identically distributed, the assumptions can be rewritten as follows.

(I) For each $\zeta>0$

[TABLE]

(II) For each $(\mathbf{t}_{1},u_{1}),(\mathbf{t}_{2},u_{2})\in\mathcal{F}$

[TABLE]

(III) For every $\rho$ -ball $B(\epsilon)\subset\mathcal{F}$ of radius less than $\epsilon$

[TABLE]

Note that the first assumption (B16) is easy to check as

[TABLE]

To verify the second assumption (B17) fix $\mathbf{t}_{1},\mathbf{t}_{2}$ and $u_{1},u_{2}$ (so that $u_{1}\leq u_{2}$ ) and calculate

[TABLE]

Now we will have a look at the first term on the right-hand side of (B19). For a given $u\in[\delta_{n}/2,1/2]$ by the mean value theorem there exists $\mathbf{t}_{*}$ between $\mathbf{t}_{1}$ and $\mathbf{t}_{2}$ such that

[TABLE]

Now with the help of (B4), (B10), (B11) and Lemma 10 one can conclude that with probability going to one

[TABLE]

which together with (B) implies that

[TABLE]

uniformly in $u$ .

Now fix $\mathbf{t}$ and $\mathbf{x}$ . Then by the mean value theorem there exists $\tilde{u}$ between $u_{1}^{(n)}$ and $u_{2}^{(n)}$ such that

[TABLE]

which together with

[TABLE]

assumption $(\mathbf{F}_{j\varepsilon})$ and (B23) implies that

[TABLE]

uniformly in $\mathbf{t}$ and $\mathbf{x}$ .

Now combining the inequalities (B21), (B22) and (B24) implies that

[TABLE]

Now turn our attention to the second term on the right-hand side of (B19). Analogously as above one can bound

[TABLE]

Combining this with (B19) and (B25) one gets

[TABLE]

where the last inequality follows by Lemma 13(iii) in Appendix D.

Finally we show that also the third assumption (B18) is satisfied. Let $B(\epsilon)$ be a fixed $\epsilon$ -ball. Then from the properties of the Euclidean norm and the function $\rho_{0}$ (see Lemma 13(iv) in Appendix D), there exist $\mathbf{t}_{0}\in T_{1}$ and $u_{L},u_{U}\in(0,\tfrac{1}{2}]$ such that

[TABLE]

Then one can bound

[TABLE]

To deal with the last probability introduce

[TABLE]

Then one can bound

[TABLE]

where $V_{n1}$ , $V_{n2}$ stand for the first and second term on the right-hand side of (B27) respectively.

Now similarly as in (B25) one can bound the second moment of $V_{n1}$ as

[TABLE]

provided that $K$ in the definition of the semimetric (B6) is taken sufficiently large.

Thus also by Markov’s inequality

[TABLE]

Now we can concentrate on the second term in (B27). To do so note that from the definition of the semimetric $\rho_{0}$ in (B15) it follows that for each $u\in[u_{L},u_{U}]$

[TABLE]

which further implies that

[TABLE]

Using the above inequality one can bound (with probability going to one)

[TABLE]

where we have used that thanks to (B21)

[TABLE]

and for each $\mathbf{t},u$

[TABLE]

Thus we can bound

[TABLE]

for a sufficiently large $K$ . Now combining (B28) and (B29) yields that

[TABLE]

which together with (B26) implies that also (B18) is satisfied.

∎

Note that while $\lambda_{x}$ is only a cleverly chosen constant in Lemma 5 that is not involved in the statement, in the following lemmas we will speak about $\mathrm{J}_{jn}^{X}$ and thus we need to be more specific about $\lambda_{x}$ . Thus in what follows we often assume that

[TABLE]

Lemma 6.

Suppose that the assumptions of Lemma 5 are satisfied and $\lambda_{x}$ satisfies (B30). Then it holds uniformly in $k\in\mathrm{J}_{jn}^{X}$

[TABLE]

for each $\gamma>0$ and $j\in\{1,\dotsc,d\}$ .

Proof.

The lemma will be shown by substitution of $u=F_{j\varepsilon}(\widehat{\varepsilon}_{jk})$ into the approximation (B1) stated in Lemma 5. Note that all the following statements holds uniformly in $k\in\mathrm{J}_{jn}^{X}$ .

The proof will be divided into four steps. First we show that with probability going to one

[TABLE]

to justify the substitution into (B1). Second

[TABLE]

Next we show that

[TABLE]

and finally we derive

[TABLE]

and realise that $\widehat{U}_{jk}=\widehat{F}_{j\widehat{\varepsilon}}(\widehat{\varepsilon}_{jk})$ and $\widetilde{U}_{jk}=\widehat{F}_{j\varepsilon}(\varepsilon_{jk})$ .

Showing (B31).

Analogously as in (B22) for $k\in\mathrm{J}_{jn}^{X}$

[TABLE]

This further implies that

[TABLE]

where we have used that $\lambda_{x}$ satisfies (B30). Thus for a sufficiently large $n$ one gets that

[TABLE]

and analogously also

[TABLE]

Showing (B32).

Note that with the help of (B36) one can conclude that

[TABLE]

which implies (B32).

Showing (B33) and (B34). This follows from (B31), (B12) and (B13).

Showing (B35).

Without loss of generality consider only those $k\in\mathrm{J}_{jn}^{X}$ for which $U_{jk}\leq\tfrac{1}{2}$ . Now for $\eta\in(0,\tfrac{1}{2}-\tfrac{\beta}{\lambda}-\tfrac{1}{\lambda_{x}r}$ ) introduce

[TABLE]

Similarly as in the proof of Lemma 5 define for $i\in\{1,\dotsc,n\}$ the processes

[TABLE]

that are indexed by the set $\mathcal{F}=[-1,1]^{2}\times(0,\tfrac{1}{2}]$ . Now one can write $\widehat{F}_{j\varepsilon}(\widehat{\varepsilon}_{jk})$ as

[TABLE]

where

[TABLE]

Note that for $k\in\mathrm{J}_{jn}^{X}$

[TABLE]

Now equip the space $\mathcal{F}$ with the semimetric $\rho$ given by

[TABLE]

where $K$ is a sufficiently large but finite constant. Then completely analogously as in the proof of Lemma 5 one can verify the assumptions of Theorem 2.11.11 of van der Vaart and Wellner, (1996). Thus $\sup_{u\in(0,\frac{1}{2}]}\rho\big{(}(\hat{\mathbf{t}}_{n},u),(\boldsymbol{0},u)\big{)}=o_{P}(1)$ , implies that

[TABLE]

which together with (B37) implies that

[TABLE]

Now the right-hand side of the above equations can be with the help of (B12) and (B13) rewritten as

[TABLE]

which combined with (B38) implies (B35). ∎

Lemma 7.

Suppose that the assumptions of Lemma 6 are satisfied and $\beta>0$ . Then for each $\epsilon>0$ there exists $L_{\epsilon}>0$ such that for each $j\in\{1,\dotsc,d\}$ for all sufficiently large $n$

[TABLE]

Proof.

We concentrate on the inequality $L_{\epsilon}\,U_{jk}\leq\widehat{U}_{jk}$ . Showing the upper inequality for $\widehat{U}_{jk}$ would be analogous.

By Lemma 6 one gets $\widehat{U}_{jk}\geq\widetilde{U}_{jk}-|R_{jk}|$ , where

[TABLE]

and $\gamma>0$ can be taken arbitrarily small.

Now by Lemma A3 of Shorack, (1972) for each $\epsilon>0$ there exists $\widetilde{L}\in(0,1)$ such that

[TABLE]

Thus one can take $L_{\epsilon}=\widetilde{L}/2$ provided we show that

[TABLE]

To do that one can consider each of the summands on the right-hand side of (B39) separately. Thus for instance one has that uniformly in $k\in\mathrm{J}_{jn}^{X}$

[TABLE]

as $\lambda_{x}$ satisfies (B30). The other summands on the right-hand side of (B39) can be handled analogously. ∎

Some results useful when

$(\mathbf{F}_{j\varepsilon})$ holds with $\beta=0$

Lemma 8.

Suppose that assumptions $(\mathbf{F}_{j\varepsilon})$ and $\boldsymbol{(ms)}$ hold. Then for each $j\in\{1,\dotsc,d\}$

[TABLE]

Proof.

Let $U(\boldsymbol{\theta}_{j})$ be the neighborhood of $\boldsymbol{\theta}_{j}$ introduced in $\boldsymbol{(ms)}$ . Now consider the set of functions

[TABLE]

and denote its elements as $f_{\mathbf{t},u}$ . Then one can write

[TABLE]

Similarly as in the proof of Theorem 4 of Gijbels et al., (2015) one can argue that the set $\mathcal{F}$ is $P$ -Donsker. Further similarly as in the proof of Lemma 5 one can show that

[TABLE]

which further implies that uniformly in $u\in(0,1)$

[TABLE]

Now by the mean value theorem there exists $\mathbf{t}_{*}$ between $\boldsymbol{\widehat{\theta}}_{j}$ and $\boldsymbol{\theta}_{j}$ such that

[TABLE]

which together with (B41) implies (B40). ∎

Lemma 9.

Suppose that the assumptions of Lemma 8 are satisfied. Then for each $j\in\{1,\dotsc,d\}$

[TABLE]

Proof.

The proof follows by substitution of $u=F_{j\varepsilon}(\widehat{\varepsilon}_{jk})$ into (B40) and following the proof of Lemma 6. ∎

Appendix C Further auxiliary results

Lemma 10.

Suppose that assumption $(\mathbf{F}_{j\varepsilon})$ holds. Let $\lambda$ satisfy $\lambda>2(1-\beta+\tfrac{1}{r-1})$ and $\lambda_{x}$ satisfies (B30). Further for $\eta>0$ introduce $b_{n}=n^{\frac{1}{\lambda_{x}r}-\frac{1}{2}-\eta}$ . Then there exists $\eta>0$ such that for all sufficiently large $n$ for all $u\in[\tfrac{\delta_{n}}{2},\tfrac{1}{2}]$ for each $j\in\{1,\dotsc,d\}$

[TABLE]

*and *

[TABLE]

Proof.

We show only that

[TABLE]

as the remaining inequalities could be proved analogously. Thus we need to show that

[TABLE]

Now by the mean value theorem

[TABLE]

where $\tilde{u}$ is between $u$ and $2u$ . Thus with the help of (C1) it remains to show that

[TABLE]

Now by assumption $(\mathbf{F}_{j\varepsilon})$ and using the fact that $u\geq\delta_{n}$

[TABLE]

where we have used that $\lambda_{x}$ satisfies (B30), which guarantees that one can find $\eta>0$ sufficiently small so that $\frac{1}{2}-\frac{1}{\lambda_{x}r}-\eta>\frac{1-\beta}{\lambda}$ holds.

∎

Lemma 11.

Suppose that the assumptions of Lemma 10 are satisfied. Then there exists $\eta>0$ such that for all $\gamma>0$ for each $j\in\{1,\dotsc,d\}$

[TABLE]

and also

[TABLE]

as $n\to\infty$ .

Proof.

We will prove only that

[TABLE]

as the remaining cases can be shown analogously.

First suppose that $\lim_{u\to 0_{+}}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}>0$ . Then from Remark 2 one can conclude that $\lim_{u\to 0_{+}}F_{j\varepsilon}^{-1}(u)>-\infty$ and $\beta=0$ . Thus also $(\beta-\gamma)_{+}=0$ and the statement follows from the continuity of $f_{j\varepsilon}$ .

Now suppose that $\lim_{u\to 0_{+}}f_{j\varepsilon}\big{(}F_{j\varepsilon}^{-1}(u)\big{)}=0$ . Note that for a given $u_{U}\in(0,\frac{1}{2})$

[TABLE]

which follows from the continuity of the function $f_{j\varepsilon}$ .

Now let $\epsilon>0$ be given and $\gamma>0$ fixed. Thanks to assumption $(\mathbf{F}_{j\varepsilon})$ one can choose $u_{U}$ so that

[TABLE]

where $M=\sup_{u\in(0,1/2)}\frac{f_{j\varepsilon}(F_{j\varepsilon}^{-1}(2u))}{f_{j\varepsilon}(F_{j\varepsilon}^{-1}(u))}.$ Now thanks to Lemma 10 one can conclude that also

[TABLE]

which finishes the proof of the lemma. ∎

Lemma 12.

Suppose that the density $f_{j\varepsilon}$ satisfies assumption $(\mathbf{F}_{j\varepsilon})$ . Then

[TABLE]

Proof.

We will consider only $x\to\infty$ . The remaining case would be handled analogously.

First, note that one can assume that $\lim_{u\to 1_{-}}F_{j\varepsilon}^{-1}(u)=\infty$ , otherwise the proof is trivial. Now suppose that

[TABLE]

Then one can find a positive constant $a$ and a sequence $\{z_{n}\}_{n=1}^{\infty}$ monotonically going to infinity such that

[TABLE]

Note that by assumption $(\mathbf{F}_{j\varepsilon})$ the function $f_{j\varepsilon}(x)$ is non-increasing for $x>F_{j\varepsilon}^{-1}(u_{2})$ . In what follows we will assume that $z_{1}>F_{j\varepsilon}^{-1}(u_{2})$ and that $z_{n+1}\geq 2z_{n}$ (otherwise one can take an appropriate subsequence of $\{z_{n}\}$ ). Now one can bound

[TABLE]

which is in contradiction with the fact, that $f_{j\varepsilon}$ is a density. ∎

Appendix D Some properties of $\rho_{0}$ function

Recall the definition of $\rho_{0}$ in (B15) and for simplicity of notation put $b=(\beta-\gamma)_{+}$ . Then for each $u_{1},u_{2}$ satisfying $0<u_{1}\leq u_{2}\leq\frac{1}{2}$ one has $\rho_{0}(u_{1},u_{2})=r_{0}(u_{1},u_{2})$ , where

[TABLE]

Lemma 13.

Let $u_{0}\in(0,\frac{1}{2})$ and $b\in[0,\tfrac{1}{2})$ be fixed. Then the following statements hold.

(i).

The function $g_{R}(u)=r_{0}^{2}(u_{0},u)$ is increasing for $u\in(u_{0},\frac{1}{2})$ . 2. (ii).

For $b>0$ the function $g_{L}(u)=r_{0}^{2}(u,u_{0})$ is increasing on $(0,u_{*})$ and decreasing on $(u_{*},u_{0})$ , where $u_{*}=u_{0}\big{(}\frac{1-2b}{2(1-b)}\big{)}^{1/b}$ . 3. (iii).

For each $0\leq u_{1}<u_{2}<u_{0}\leq\tfrac{1}{2}$ it holds that $r_{0}^{2}(u_{2},u_{0})\leq 2\,r_{0}^{2}(u_{1},u_{0})$ . 4. (iv).

For each $\epsilon>0$ the set $U(u_{0},\epsilon)=\big{\{}u\in[0,\tfrac{1}{2}]:\rho_{0}(u,u_{0})\leq\epsilon\big{\}}$ is contained in a set $[u_{L},u_{U}]$ such that $r_{0}(u_{L},u_{U})\leq 2\epsilon$ .

Proof.

The proof of (i) follows directly from the definition of the function $g$ , as

[TABLE]

which is evidently an increasing function on $(u_{0},\frac{1}{2}]$ .

For the proof of (ii) rewrite

[TABLE]

Now it is straightforward to find that the function $g_{L}$ has exactly one local maximum in the point $u_{*}$ and meets the claimed properties.

Now we show (iii). Note that thanks to (ii) the function $g_{L}(u)$ is decreasing on $(u_{*},u_{0})$ , thus the inequality trivially holds if $u_{*}\leq u_{1}<u_{2}$ .

Thus suppose that $u_{1}<u_{*}$ . From (ii) we further know that $u_{*}=u_{0}\,a$ , where $a<1$ . Thus we can bound

[TABLE]

which was to be proved.

To prove (iv) first note that from (i) there exists $u_{U}$ such that

[TABLE]

When searching for $u_{L}$ one has to be more careful as the function $g_{L}$ is not decreasing on $(0,u_{0})$ . We need to distinguish two cases. First, let $\epsilon<r_{0}(0,u_{0})$ . Then one can find $u_{L}$ in a similar way as $u_{U}$ was found. Second, suppose that $\epsilon\geq r_{0}(0,u_{0})$ . Then we take simply $u_{L}=0$ .

Now it remains to check that $r_{0}(u_{L},u_{U})\leq 2\epsilon$ . To do that bound

[TABLE]

∎

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anděl, (1989) Anděl, J. (1989). Non-negative autoregressive processes. J. Time Series Anal. , 10(1):1–11.
2Anděl, (1992) Anděl, J. (1992). Nonnegative multivariate AR(1) processes. Kybernetika , 28(3):213–226.
3Berghaus et al., (2017) Berghaus, B., Bücher, A., and Volgushev, S. (2017). Weak convergence of the empirical copula process with respect to weighted metrics. Bernoulli , 23(1):743–772.
4Bickel et al., (1993) Bickel, P. J., Klaassen, C. A. J., Ritov, Y., and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models . Johns Hopkins University Press, Baltimore.
5Brahimi and Necir, (2012) Brahimi, B. and Necir, A. (2012). A semiparametric estimation of copula models based on the method of moments. Stat. Methodol. , 9(4):467–477.
6Bücher et al., (2015) Bücher, A., Jäschke, S., and Wied, D. (2015). Nonparametric tests for constant tail dependence with an application to energy and finance. J. Econometrics , 187(1):154–168.
7Chan et al., (2009) Chan, N.-H., Chen, J., Chen, X., Fan, Y., and Peng, L. (2009). Statistical inference for multivariate residual copula of GARCH models. Statist. Sinica , 19:53–70.
8(8) Chen, X. and Fan, Y. (2006 a). Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J. Econometrics , 135:125–154.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Maximum pseudo-likelihood estimation based on estimated residuals in copula semiparametric models

Abstract.

1. Introduction

2. Main result

2.1. Regularity assumptions on the marginal distributions

Remark 1*.*

Remark 2*.*

2.2. Regularity assumptions on mjm_{j}mj​ and sjs_{j}sj​

2.3. Regularity assumptions about the copula family C\mathcal{C}C

Definition** **(Class of J\mathcal{J}J- and

C1**.**

C2**.**

C3**.**

C4**.**

C5**.**

Remark 3*.*

C6**.**

2.4. Main results

Theorem 1**.**

Theorem 2**.**

Corollary 1**.**

3. Simulation study

3.1. Settings

3.2. Findings

4. Conclusions and further discussions

Acknowledgments

Appendix A Proofs of the main results

Lemma 1**.**

Proof.

A.1. Some results on statistics with ranks calculated

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Corollary 2**.**

Proof.

A.2. Proofs of Theorems 1 and 2

Proof of Theorem 1.

Proof of Theorem 2.

Appendix B Some results on F^jε^\widehat{F}_{j\widehat{\varepsilon}}Fjε​ and U^ji\widehat{U}_{ji}Uji​

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Some results useful when

Lemma 8**.**

Proof.

Lemma 9**.**

Proof.

Appendix C Further auxiliary results

Lemma 10**.**

Proof.

Lemma 11**.**

Proof.

Lemma 12**.**

Proof.

Appendix D Some properties of ρ0\rho_{0}ρ0​ function

Lemma 13**.**

Proof.

*Remark 1**.*

*Remark 2**.*

2.2. Regularity assumptions on $m_{j}$ and $s_{j}$

2.3. Regularity assumptions about the copula family $\mathcal{C}$

Definition (Class of $\mathcal{J}$ - and

C1.

C2.

C3.

C4.

C5.

*Remark 3**.*

C6.

Theorem 1.

Theorem 2.

Corollary 1.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Corollary 2.

Appendix B Some results on $\widehat{F}_{j\widehat{\varepsilon}}$ and $\widehat{U}_{ji}$

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

Appendix D Some properties of $\rho_{0}$ function

Lemma 13.