Indirect Inference for Time Series Using the Empirical Characteristic   Function and Control Variates

Richard A. Davis; Thiago do R\^ego Sousa; Claudia Kl\"uppelberg

arXiv:1904.08276·math.ST·February 3, 2021

Indirect Inference for Time Series Using the Empirical Characteristic Function and Control Variates

Richard A. Davis, Thiago do R\^ego Sousa, Claudia Kl\"uppelberg

PDF

TL;DR

This paper introduces a novel indirect inference method for stationary time series using the empirical characteristic function and control variates, improving estimation accuracy through variance reduction techniques.

Contribution

It develops new estimators based on Monte Carlo approximation and control variates, with proven consistency and asymptotic normality for a broad class of time series.

Findings

01

Estimators are consistent and asymptotically normal.

02

Control variates significantly reduce variance.

03

Method outperforms existing techniques in simulations.

Abstract

We estimate the parameter of a stationary time series process by minimizing the integrated weighted mean squared error between the empirical and simulated characteristic function, when the true characteristic functions cannot be explicitly computed. Motivated by Indirect Inference, we use a Monte Carlo approximation of the characteristic function based on iid simulated blocks. As a classical variance reduction technique, we propose the use of control variates for reducing the variance of this Monte Carlo approximation. These two approximations yield two new estimators that are applicable to a large class of time series processes. We show consistency and asymptotic normality of the parameter estimators under strong mixing, moment conditions, and smoothness of the simulated blocks with respect to its parameter. In a simulation study we show the good performance of these new simulation…

Figures2

Click any figure to enlarge with its caption.

Tables4

Table 1. Table 1.1: Comparison of the simulation based estimator θ ^ n , H subscript ^ 𝜃 𝑛 𝐻 \hat{\theta}_{n,H} for H = 3 000 𝐻 3000 H=3\,000 , the oracle estimator θ ^ n subscript ^ 𝜃 𝑛 \hat{\theta}_{n} , and the MLE for sample size n = 400 𝑛 400 n=400 . For all estimators we have taken p = 3 𝑝 3 p=3 with w 𝑤 w the Gaussian density as in ( 5.3 ). Reported results are based on 500 replications.

	$d = 0.05$			$d = 0.10$			$d = 0.15$
	Bias	Std	RMSE	Bias	Std	RMSE	Bias	Std	RMSE
${\hat{θ}}_{n, H}$	0.000	0.056	0.056	0.002	0.054	0.054	0.004	0.049	0.049
${\hat{θ}}_{n}$	-0.005	0.050	0.050	-0.004	0.047	0.047	-0.004	0.044	0.045
MLE	-0.015	0.040	0.043	-0.015	0.040	0.043	-0.016	0.040	0.043
	$d = 0.20$			$d = 0.25$			$d = 0.30$
	Bias	Std	RMSE	Bias	Std	RMSE	Bias	Std	RMSE
${\hat{θ}}_{n, H}$	0.003	0.047	0.047	0.000	0.046	0.046	-0.003	0.048	0.048
${\hat{θ}}_{n}$	-0.004	0.045	0.045	-0.006	0.044	0.044	-0.007	0.046	0.047
MLE	-0.016	0.040	0.043	-0.017	0.039	0.043	-0.017	0.039	0.043
	$d = 0.35$			$d = 0.40$			$d = 0.45$
	Bias	Std	RMSE	Bias	Std	RMSE	Bias	Std	RMSE
${\hat{θ}}_{n, H}$	-0.006	0.050	0.051	-0.013	0.051	0.052	-0.022	0.047	0.052
${\hat{θ}}_{n}$	-0.009	0.049	0.050	-0.013	0.051	0.052	-0.021	0.048	0.052
MLE	-0.019	0.039	0.043	-0.021	0.037	0.043	-0.027	0.034	0.043

Table 2. Table 1.2: Comparison of the simulation based estimator θ ^ n , H subscript ^ 𝜃 𝑛 𝐻 \hat{\theta}_{n,H} for H = 3 000 𝐻 3000 H=3\,000 , the quasi-oracle estimator θ ^ n subscript ^ 𝜃 𝑛 \hat{\theta}_{n} , and the QMLE for sample size n = 400 𝑛 400 n=400 . For all estimators we have taken p = 3 𝑝 3 p=3 with w 𝑤 w the Gaussian density as in ( 5.3 ). Reported are results based on 500 replications.

	$d = 0.05$			$d = 0.10$			$d = 0.15$
	Bias	Std	RMSE	Bias	Std	RMSE	Bias	Std	RMSE
${\hat{θ}}_{n, H}$	-0.004	0.062	0.062	-0.003	0.060	0.060	0.004	0.054	0.054
${\hat{θ}}_{n}$	-0.005	0.051	0.051	-0.003	0.049	0.049	0.001	0.048	0.047
QMLE	-0.012	0.043	0.045	-0.013	0.043	0.045	-0.013	0.043	0.045
	$d = 0.20$			$d = 0.25$			$d = 0.30$
${\hat{θ}}_{n, H}$	0.008	0.049	0.050	0.012	0.051	0.053	0.012	0.049	0.051
${\hat{θ}}_{n}$	0.005	0.047	0.047	0.009	0.046	0.047	0.010	0.046	0.047
QMLE	-0.014	0.042	0.044	-0.014	0.042	0.044	-0.015	0.042	0.044
	$d = 0.35$			$d = 0.40$			$d = 0.45$
${\hat{θ}}_{n, H}$	0.009	0.045	0.046	-0.004	0.042	0.042	-0.022	0.037	0.043
${\hat{θ}}_{n}$	0.006	0.044	0.044	-0.004	0.040	0.040	-0.023	0.035	0.042
QMLE	-0.016	0.041	0.044	-0.019	0.039	0.044	-0.025	0.035	0.043

Table 3. Table 1.3: Comparison of the simulation based estimator θ ^ n , H subscript ^ 𝜃 𝑛 𝐻 \hat{\theta}_{n,H} for H = 3 000 𝐻 3000 H=3\,000 , the quasi-oracle estimator θ ^ n subscript ^ 𝜃 𝑛 \hat{\theta}_{n} , and the QMLE for sample size n = 400 𝑛 400 n=400 . For all estimators we have taken p = 3 𝑝 3 p=3 with w 𝑤 w the Gaussian density as in ( 5.3 ). Reported results are based on 500 replications.

	Bias	Std	RMSE	Bias	Std	RMSE	Bias	Std	RMSE
	$d = 0.05$			$d = 0.10$			$d = 0.15$
${\hat{θ}}_{n, H}$	-0.002	0.063	0.063	0.005	0.059	0.060	0.008	0.053	0.054
${\hat{θ}}_{n}$	-0.002	0.052	0.052	-0.001	0.050	0.050	0.000	0.048	0.048
QMLE	-0.012	0.039	0.041	-0.012	0.039	0.041	-0.013	0.039	0.041
	$d = 0.20$			$d = 0.25$			$d = 0.30$
${\hat{θ}}_{n, H}$	0.008	0.049	0.050	0.004	0.050	0.050	0.008	0.050	0.050
${\hat{θ}}_{n}$	0.001	0.047	0.047	0.002	0.047	0.047	0.001	0.047	0.047
QMLE	-0.013	0.039	0.041	-0.014	0.039	0.041	-0.014	0.039	0.041
	$d = 0.35$			$d = 0.40$			$d = 0.45$
${\hat{θ}}_{n, H}$	0.002	0.049	0.049	-0.009	0.043	0.044	-0.029	0.038	0.048
${\hat{θ}}_{n}$	-0.004	0.046	0.046	-0.014	0.043	0.045	-0.031	0.038	0.049
QMLE	-0.016	0.038	0.041	-0.018	0.037	0.041	-0.024	0.033	0.041

Table 4. Table 1.4: Comparison of the simulation based estimator θ ^ n , H subscript ^ 𝜃 𝑛 𝐻 \hat{\theta}_{n,H} of ( 2.8 ) and the control variates based estimator θ ^ n , H , k (cv) superscript subscript ^ 𝜃 𝑛 𝐻 𝑘 (cv) \hat{\theta}_{n,H,k}^{\text{(cv)}} of ( 2.19 ) with k = 1 𝑘 1 k=1 for sample size n = 400 𝑛 400 n=400 . For all estimators we have taken H = 3 000 𝐻 3000 H=3\,000 , p = 3 𝑝 3 p=3 with w 𝑤 w the Laplace density as in ( 5.2 ). Reported results are based on 500 500 500 replications. The models are classified by the index of dispersion D = e β + α 1 𝐷 superscript 𝑒 𝛽 subscript 𝛼 1 D=e^{\beta+\alpha_{1}} . For each setting, the smallest RMSEs are shaded.

$D = 10$
	$β$	$ϕ$	$σ$	$β$	$ϕ$	$σ$	$β$	$ϕ$	$σ$
TRUE	-0.613	-0.500	1.236	-0.613	0.500	1.236	-0.613	0.900	0.622
Bias( ${\hat{θ}}_{n, H}$ )	-0.015	0.025	0.002	-0.012	0.014	-0.032	-0.016	-0.010	0.002
RMSE( ${\hat{θ}}_{n, H}$ )	0.096	0.101	0.119	0.148	0.107	0.120	0.298	0.054	0.128
Bias( ${\hat{θ}}_{n, H, k}^{(cv)}$ )	0.023	0.031	-0.007	0.006	0.002	-0.018	0.061	-0.007	-0.036
RMSE( ${\hat{θ}}_{n, H, k}^{(cv)}$ )	0.102	0.129	0.122	0.138	0.098	0.098	0.285	0.049	0.132
$D = 1$
TRUE	0.150	-0.500	0.619	0.150	0.500	0.619	0.150	0.900	0.312
Bias( ${\hat{θ}}_{n, H}$ )	-0.004	0.024	-0.016	-0.006	0.005	-0.023	-0.016	-0.033	0.028
RMSE( ${\hat{θ}}_{n, H}$ )	0.057	0.144	0.088	0.074	0.141	0.081	0.147	0.084	0.095
Bias( ${\hat{θ}}_{n, H, k}^{(cv)}$ )	0.003	-0.011	-0.017	0.001	0.023	-0.019	0.003	-0.009	-0.012
RMSE( ${\hat{θ}}_{n, H, k}^{(cv)}$ )	0.055	0.124	0.085	0.071	0.102	0.069	0.145	0.062	0.087
$D = 0.1$
TRUE	0.373	-0.500	0.220	0.373	0.500	0.220	0.373	0.900	0.111
Bias( ${\hat{θ}}_{n, H}$ )	-0.011	0.032	-0.045	-0.015	-0.322	-0.036	-0.019	-0.517	0.044
RMSE( ${\hat{θ}}_{n, H}$ )	0.043	0.408	0.098	0.047	0.657	0.102	0.066	0.801	0.099
Bias( ${\hat{θ}}_{n, H, k}^{(cv)}$ )	-0.002	0.056	-0.044	-0.003	-0.120	-0.038	-0.004	-0.310	0.031
RMSE( ${\hat{θ}}_{n, H, k}^{(cv)}$ )	0.042	0.482	0.112	0.045	0.504	0.108	0.062	0.555	0.090

Equations270

X_{j} = (X_{j}, \dots, X_{j + p - 1}), j = 1, \dots, n,

X_{j} = (X_{j}, \dots, X_{j + p - 1}), j = 1, \dots, n,

\tilde{X}_{j} (θ) = (\tilde{X}_{1}^{(j)} (θ), \dots, \tilde{X}_{p}^{(j)} (θ)), j = 1, \dots, H,

\tilde{X}_{j} (θ) = (\tilde{X}_{1}^{(j)} (θ), \dots, \tilde{X}_{p}^{(j)} (θ)), j = 1, \dots, H,

X_{j} (θ) = (X_{j} (θ), \dots, X_{j + p - 1} (θ)), j = 1, \dots, n,

X_{j} (θ) = (X_{j} (θ), \dots, X_{j + p - 1} (θ)), j = 1, \dots, n,

φ_{n} (t) = \frac{1}{n} j = 1 \sum n e^{i ⟨ t, X_{j} ⟩}, t \in R^{p} .

φ_{n} (t) = \frac{1}{n} j = 1 \sum n e^{i ⟨ t, X_{j} ⟩}, t \in R^{p} .

\hat{θ}_{n} = argmin_{θ \in Θ} Q_{n} (θ),

\hat{θ}_{n} = argmin_{θ \in Θ} Q_{n} (θ),

Q_{n} (θ) = \int_{R^{p}} ∣ φ_{n} (t) - φ (t, θ) ∣^{2} w (t) d t, θ \in Θ,

Q_{n} (θ) = \int_{R^{p}} ∣ φ_{n} (t) - φ (t, θ) ∣^{2} w (t) d t, θ \in Θ,

φ (t, θ) = E e^{i ⟨ t, X_{1} (θ)⟩}, t \in R^{p} .

φ (t, θ) = E e^{i ⟨ t, X_{1} (θ)⟩}, t \in R^{p} .

\tilde{X}_{j} (θ) = (\tilde{X}_{1}^{(j)} (θ), \dots, \tilde{X}_{p}^{(j)} (θ)), j = 1, \dots, H,

\tilde{X}_{j} (θ) = (\tilde{X}_{1}^{(j)} (θ), \dots, \tilde{X}_{p}^{(j)} (θ)), j = 1, \dots, H,

φ_{H} (t, θ) = \frac{1}{H} j = 1 \sum H e^{i ⟨ t, \tilde{X}_{j} (θ)⟩}, t \in R^{p} .

φ_{H} (t, θ) = \frac{1}{H} j = 1 \sum H e^{i ⟨ t, \tilde{X}_{j} (θ)⟩}, t \in R^{p} .

\hat{θ}_{n, H} = θ \in Θ arg min Q_{n, H} (θ),

\hat{θ}_{n, H} = θ \in Θ arg min Q_{n, H} (θ),

Q_{n, H} (θ) = \int_{R^{p}} ∣ φ_{n} (t) - φ_{H} (t, θ) ∣^{2} w (t) d t,

Q_{n, H} (θ) = \int_{R^{p}} ∣ φ_{n} (t) - φ_{H} (t, θ) ∣^{2} w (t) d t,

ξ_{H} (t, θ) = ∣ φ_{H} (t, θ) - φ (t, θ) ∣, t \in R^{p}, θ \in Θ,

ξ_{H} (t, θ) = ∣ φ_{H} (t, θ) - φ (t, θ) ∣, t \in R^{p}, θ \in Θ,

h_{ν, t, θ} (x) = ⟨ t, x ⟩^{ν} - E ⟨ t, X_{1} (θ) ⟩^{ν}, t \in R^{p},

h_{ν, t, θ} (x) = ⟨ t, x ⟩^{ν} - E ⟨ t, X_{1} (θ) ⟩^{ν}, t \in R^{p},

P_{H, θ} (f_{t}) = \frac{1}{H} j = 1 \sum H f_{t} (\tilde{X}_{j} (θ)) = \frac{1}{H} j = 1 \sum H e^{i ⟨ t, \tilde{X}_{j} (θ)⟩} = φ_{H} (t, θ) .

P_{H, θ} (f_{t}) = \frac{1}{H} j = 1 \sum H f_{t} (\tilde{X}_{j} (θ)) = \frac{1}{H} j = 1 \sum H e^{i ⟨ t, \tilde{X}_{j} (θ)⟩} = φ_{H} (t, θ) .

V ar [P_{H, θ} (f_{t})] = H^{- 1} σ_{θ}^{2} (f_{t}) \mbox w i t h σ_{θ}^{2} (f_{t}) = P_{θ} ({f_{t} - P_{θ} (f_{t})}^{2}) .

V ar [P_{H, θ} (f_{t})] = H^{- 1} σ_{θ}^{2} (f_{t}) \mbox w i t h σ_{θ}^{2} (f_{t}) = P_{θ} ({f_{t} - P_{θ} (f_{t})}^{2}) .

β_{θ, f_{t}}^{(opt)} (h_{t, θ}) = {P_{θ} (h_{t, θ} h_{t, θ}^{T})}^{- 1} P_{θ} (h_{t, θ} f_{t}),

β_{θ, f_{t}}^{(opt)} (h_{t, θ}) = {P_{θ} (h_{t, θ} h_{t, θ}^{T})}^{- 1} P_{θ} (h_{t, θ} f_{t}),

φ_{H}^{(cvopt)} (t, θ) = P_{H, θ} (f_{t}) - (β_{θ, f_{t}}^{(opt)} (h_{t, θ}))^{T} P_{H, θ} (h_{t, θ})

φ_{H}^{(cvopt)} (t, θ) = P_{H, θ} (f_{t}) - (β_{θ, f_{t}}^{(opt)} (h_{t, θ}))^{T} P_{H, θ} (h_{t, θ})

V ar [⟨ t, \tilde{X}_{1} (θ)⟩] V ar [⟨ t, \tilde{X}_{1} (θ) ⟩^{2}] - {C ov [⟨ t, \tilde{X}_{1} (θ)⟩, ⟨ t, \tilde{X}_{1} (θ) ⟩^{2}]}^{2} .

V ar [⟨ t, \tilde{X}_{1} (θ)⟩] V ar [⟨ t, \tilde{X}_{1} (θ) ⟩^{2}] - {C ov [⟨ t, \tilde{X}_{1} (θ)⟩, ⟨ t, \tilde{X}_{1} (θ) ⟩^{2}]}^{2} .

{C ov [⟨ t, \tilde{X}_{1} (θ)⟩, ⟨ t, \tilde{X}_{1} (θ) ⟩^{2}]}^{2} \leq V ar [⟨ t, \tilde{X}_{1} (θ)⟩] V ar [⟨ t, \tilde{X}_{1} (θ) ⟩^{2}],

{C ov [⟨ t, \tilde{X}_{1} (θ)⟩, ⟨ t, \tilde{X}_{1} (θ) ⟩^{2}]}^{2} \leq V ar [⟨ t, \tilde{X}_{1} (θ)⟩] V ar [⟨ t, \tilde{X}_{1} (θ) ⟩^{2}],

det (P_{θ} (h_{t, θ} h_{t, θ}^{T}) = 0 ⟺ a ⟨ t, \tilde{X}_{1} (θ)⟩ + b ⟨ t, \tilde{X}_{1} (θ) ⟩^{2} + c = a.s. 0,

det (P_{θ} (h_{t, θ} h_{t, θ}^{T}) = 0 ⟺ a ⟨ t, \tilde{X}_{1} (θ)⟩ + b ⟨ t, \tilde{X}_{1} (θ) ⟩^{2} + c = a.s. 0,

\hat{β}_{H, θ, f_{t}} (h_{t, θ}) = {P_{H, θ} (h_{t, θ} h_{t, θ}^{T}) - P_{H, θ} (h_{t, θ}) P_{H, θ} (h_{t, θ}^{T})}^{- 1} \times {P_{H, θ} (h_{t, θ} f_{t}) - P_{H, θ} (h_{t, θ}) P_{H, θ} (f_{t})} .

\hat{β}_{H, θ, f_{t}} (h_{t, θ}) = {P_{H, θ} (h_{t, θ} h_{t, θ}^{T}) - P_{H, θ} (h_{t, θ}) P_{H, θ} (h_{t, θ}^{T})}^{- 1} \times {P_{H, θ} (h_{t, θ} f_{t}) - P_{H, θ} (h_{t, θ}) P_{H, θ} (f_{t})} .

φ_{H}^{(cv)} (t, θ) = P_{H, θ} (f_{t}) - κ_{H} (t, θ), t \in R^{p},

φ_{H}^{(cv)} (t, θ) = P_{H, θ} (f_{t}) - κ_{H} (t, θ), t \in R^{p},

κ_{H} (t, θ) = (\hat{β}_{H, θ, f_{t}} (h_{t, θ}))^{T} P_{H, θ} (h_{t, θ}) .

κ_{H} (t, θ) = (\hat{β}_{H, θ, f_{t}} (h_{t, θ}))^{T} P_{H, θ} (h_{t, θ}) .

\hat{θ}_{n, H, k}^{(cv)} = argmin_{θ \in Θ} Q_{n, H, k}^{(cv)} (θ),

\hat{θ}_{n, H, k}^{(cv)} = argmin_{θ \in Θ} Q_{n, H, k}^{(cv)} (θ),

Q_{n, H, k}^{(cv)} (θ) =

Q_{n, H, k}^{(cv)} (θ) =

\displaystyle\int_{\mathbb{R}^{p}}\bigg{|}\varphi_{n}(t)-\bigg{(}\varphi^{\text{(cv)}}_{H}(t,\theta)1_{\{\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)<k\}}+\varphi_{H}(t,\theta)1_{\{\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)\geq k\}}\bigg{)}\bigg{|}^{2}\bar{w}(t)\text{d}t,

\overset{γ}{^}_{p} (h) = \frac{1}{n - h} j = 1 \sum n - h (X_{j} - \overset{μ}{^}_{n}) (X_{j + h} - \overset{μ}{^}_{n}), h = 1, \dots, p,

\overset{γ}{^}_{p} (h) = \frac{1}{n - h} j = 1 \sum n - h (X_{j} - \overset{μ}{^}_{n}) (X_{j + h} - \overset{μ}{^}_{n}), h = 1, \dots, p,

K_{j}(\theta)=\int_{{\mathbb{R}}^{p}}\Big{(}\frac{\partial}{\partial{\theta}}\Re(\varphi(t,\theta)),\frac{\partial}{\partial{\theta}}\Im(\varphi(t,\theta))\Big{)}\begin{pmatrix}\cos(\langle t,{\bm{X}}_{j}\rangle)-\Re(\varphi(t,\theta))\\ \sin(\langle t,{\bm{X}}_{j}\rangle)-\Im(\varphi(t,\theta))\end{pmatrix}w(t){\rm d}t,\quad j\in{\mathbb{N}}

K_{j}(\theta)=\int_{{\mathbb{R}}^{p}}\Big{(}\frac{\partial}{\partial{\theta}}\Re(\varphi(t,\theta)),\frac{\partial}{\partial{\theta}}\Im(\varphi(t,\theta))\Big{)}\begin{pmatrix}\cos(\langle t,{\bm{X}}_{j}\rangle)-\Re(\varphi(t,\theta))\\ \sin(\langle t,{\bm{X}}_{j}\rangle)-\Im(\varphi(t,\theta))\end{pmatrix}w(t){\rm d}t,\quad j\in{\mathbb{N}}

Q=\int_{{\mathbb{R}}^{p}}\Big{(}\frac{\partial}{\partial{\theta}}\Re(\varphi(t,\theta_{0})),\frac{\partial}{\partial{\theta}}\Im(\varphi(t,\theta_{0}))\Big{)}\Big{(}\frac{\partial}{\partial{\theta}}\Re(\varphi(t,\theta_{0})),\frac{\partial}{\partial{\theta}}\Im(\varphi(t,\theta_{0}))\Big{)}^{T}w(t){\rm d}t.

Q=\int_{{\mathbb{R}}^{p}}\Big{(}\frac{\partial}{\partial{\theta}}\Re(\varphi(t,\theta_{0})),\frac{\partial}{\partial{\theta}}\Im(\varphi(t,\theta_{0}))\Big{)}\Big{(}\frac{\partial}{\partial{\theta}}\Re(\varphi(t,\theta_{0})),\frac{\partial}{\partial{\theta}}\Im(\varphi(t,\theta_{0}))\Big{)}^{T}w(t){\rm d}t.

n (\hat{θ}_{n, H} - θ_{0}) \to d N (0, Q^{- 1} W Q^{- 1}), n \to \infty,

n (\hat{θ}_{n, H} - θ_{0}) \to d N (0, Q^{- 1} W Q^{- 1}), n \to \infty,

W = V ar [K_{1} (θ_{0})] + 2 j = 2 \sum \infty C ov [K_{1} (θ_{0}), K_{j} (θ_{0})] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Indirect Inference for Time Series Using the Empirical Characteristic Function and Control Variates

Richard A. Davis Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY 10027, USA, email: [email protected]

Thiago do Rêgo Sousa Center for Mathematical Sciences, Technical University of Munich, 85748 Garching, Boltzmannstr. 3, Germany, email: [email protected], [email protected]

Claudia Klüppelberg22footnotemark: 2

Abstract

We estimate the parameter of a stationary time series process by minimizing the integrated weighted mean squared error between the empirical and simulated characteristic function, when the true characteristic functions cannot be explicitly computed. Motivated by Indirect Inference, we use a Monte Carlo approximation of the characteristic function based on iid simulated blocks. As a classical variance reduction technique, we propose the use of control variates for reducing the variance of this Monte Carlo approximation. These two approximations yield two new estimators that are applicable to a large class of time series processes. We show consistency and asymptotic normality of the parameter estimators under strong mixing, moment conditions, and smoothness of the simulated blocks with respect to its parameter. In a simulation study we show the good performance of these new simulation based estimators, and the superiority of the control variates based estimator for Poisson driven time series of counts.

AMS 2010 Subject Classifications: 62F12, 62G20, 62M10, 65C05, 91G70 ,

Keywords: Asymptotic normality, Characteristic function, Control variates, Indirect Inference estimation, Time series of counts, SLLN, Variance reduction

1 Introduction

Let $(X_{j})_{j\in{\mathbb{Z}}}$ be a stationary time series, whose distribution depends on $\theta\in\Theta\subset{\mathbb{R}}^{q}$ for some $q\in{\mathbb{N}}$ . Denote by $\theta_{0}\in\Theta$ the true parameter, which we want to estimate from observations $X_{1},\dots,X_{T}$ of the time series. Maximum likelihood estimation (MLE) has been extensively used for parameter estimation, since under weak regularity conditions it is known to be asymptotically efficient. For many models, however, MLE is not always feasible to carry out, due to a likelihood that may be intractable to compute, or maximization of the likelihood is difficult, or because the likelihood function is unbounded on $\Theta$ . To overcome such problems, alternative methods have been developed, for instance, the generalized method of moments (GMM) in Hansen (1982), the quasi-maximum likelihood estimation (QMLE) in White (1982), and composite likelihood methods in Lindsay (1988).

In a similar vein, Feuerverger (1990) proposed an estimator based on matching the empirical characteristic function (chf) computed from blocks of the observed time series and the true chf. More specifically, given a fixed $p\in{\mathbb{N}}$ , the observed blocks of $X_{1},\dots,X_{T}$ are

[TABLE]

where $n=T-p+1$ . In that paper, a finite set of points in ${\mathbb{R}}^{p}$ needs to be chosen as arguments for which the true and the empirical chf are compared. However, the practical choice of this set depends on the problem at hand and the asymptotic results derived in Feuerverger (1990) do not offer practical guidance for choosing these points. To overcome this limitation Yu (1998) and Knight and Yu (2002) considered a integrated weighted squared distance between the empirical and the true chfs.

This method has been used in a variety of applications; an interesting review paper, Yu (2004) contains a wealth of examples and references. More recent publications, where the method has been successfully applied to discrete-time models include Knight, Satchell, and Yu (2002), Meintanis and Taufer (2012), Kotchoni (2012), Milovanovic, Popovic, and Stojanovic (2014), Francq and Meintanis (2016), and Ndongo et al. (2016). The method also applies to continuous-time processes after discretization and has been used prominently for Lévy-driven models. The book Belomestny et al. (2015) provides additional insight and references in this field.

The principal goal of this paper is to extend the ideas of these papers to a more general setting. For example, we do not assume the idealized situation for which the chf has an explicit expression as a function of $\theta\in\Theta$ . We propose two new estimators of $\theta$ , which are based on replacing the true chf with estimates that are constructed from a functional approximation of the chf constructed from simulated sample paths of $(X_{j}(\theta))_{j\in{\mathbb{Z}}}$ .

While much attention has been given to the choice of the integrated distance used when computing such estimators, which under some regularity conditions can achieve the Cramér-Rao efficiency bound (see eq. (2.3) of Knight and Yu (2002) and Proposition 4.2 of Carrasco et al. (2007)), the focus of our paper is on the practical and theoretical aspects that emerge when it is required to approximate the theoretical chf for parameter estimation. For more details on the search for efficient estimators we refer to Carrasco et al. (2007); Carrasco and Florens (2014); Carrasco and Kotchoni (2017).

Our first estimator is computed from a simple Monte Carlo approximation to replace the true, but unknown chf. This is similar to the simulated method of moments of McFadden (1989) and of the indirect inference method (Smith (1993) and Gourieroux et al. (1993)). In particular, indirect inference has been successfully applied in a variety of situations: parameter estimation of continuous time models with stochastic volatility (Bianchi and Cleur (1996), Jiang (1998), Raknerud and Skare (2012), Laurini and Hotta (2013) and Wahlberg, Welsh, and Ljung (2015)), robust estimation (de Luna and Genton (2001) and Fasen-Hartmann and Kimmig (2020)), and finite sample bias reduction (Gourieroux, Renault, and Touzi (2000, 2010) and Do Rêgo Sousa, Haug, and Klüppelberg (2019)).

More precisely, for many different $\theta\in\Theta$ , we simulate an iid sample of blocks denoted by

[TABLE]

for $H\in{\mathbb{N}}$ , and define a simulation based parameter estimator, which minimizes the integrated weighted mean squared error, which is the integrated distance we use, between the empirical chf computed from the blocks (1.2) of the observed time series and its simulated version computed from a large number of simulated paths of the time series.

This is in contrast to the simulation based estimator defined in Section 5.2 of Carrasco et al. (2007), which is computed from one long time series path instead of the iid sample of blocks in (1.2) (a similar method has been applied by Forneron (2018) to estimate the structural parameters and the distribution of shocks in dynamic models). Since we compute the Monte Carlo approximation of the chf from independent blocks, it should have smaller variance than the corresponding one for dependent blocks. Our method gives a chf approximation which yields strongly consistent and asymptotically normal parameter estimators. We also report their small sample properties for different models.

Furthermore, as the Monte Carlo approximation of the chf is computed from iid blocks of a time series, control variates techniques (see Glynn and Szechtman (2002) and Robert and Casella (2004)) provide an even more accurate approximation for the chf. Control variates techniques are classical variance reduction methods in simulation. The idea is to use a set of control variates, which are correlated with the chf. The method then approximates the joint covariance matrix of the control variates and the chf, and uses it to construct a new Monte Carlo approximation of the chf. We choose the first two terms in the Taylor expansion of the complex exponential $e^{i\langle t,{\bm{X}}_{1}(\theta)\rangle}$ , $\langle t,{\bm{X}}_{1}(\theta)\rangle$ and $\langle t,{\bm{X}}_{1}(\theta)\rangle^{2}$ for $\theta\in\Theta$ as control variates, where $\langle\cdot,\cdot\rangle$ denotes the usual Euclidean inner product in ${\mathbb{R}}^{d}$ . This requires knowing the mean and covariance matrix of ${\bm{X}}_{1}(\theta)$ for $\theta\in\Theta$ .

In assessing the performance of both the Monte Carlo approximation and the control variates approximation of the chf, two trends emerge. First, both the Monte Carlo and the control variates approximations work better for small values of the argument. Second, the control variates approximation performs much better than the Monte Carlo approximation, in particular, for small values of the argument. As a consequence, we propose a control variates based parameter estimator whose integrated mean squared error distinguishes between small and large values of the argument.

Under regularity conditions we prove strong consistency of the proposed parameter estimators and asymptotic normality of the simulation based parameter estimator. We find that the simulation based parameter estimator is asymptotically normal with asymptotic covariance matrix equal to the one of the oracle estimator as derived in Knight and Yu (2002). From this we conclude that there cannot be any improvement in the limit law for the asymptotic normality of the control variates based estimator. However, we prove that it is computed from a better approximation of the chf. Thus, the control variates estimator improves the finite sample performance compared to the simulation based parameter estimator.

It is assumed throughout that $(X_{j})_{j\in\mathbb{Z}}$ is a stationary time series. This ensures that the blocks of random variables in (1.1) are stationary, from which we obtain convergence of the empirical chf to the joint chf. Now in some restricted cases, our method can be adapted to special types of nonstationarity. For example, if $(X_{j})_{j\in\mathbb{Z}}$ is nonstationary, but the differenced process $\nabla X_{j}=X_{j}-X_{j-1}$ is stationary, then our methodology can be applied directly to $\nabla X_{j}$ . Similarly, if $X_{j}=Y_{j}+\mu_{j}$ , where $Y_{j}$ is stationary and $\mu_{j}$ is a mean function that can be estimated consistently say by $\hat{\mu}_{j}$ , then the methodology can be applied to $X_{j}-\hat{\mu}_{j}$ . We do not pursue this line of investigation here.

The finite sample performance of the estimators are investigated for two important models. We begin with a stationary Gaussian ARFIMA model, whose chf is explicitly known so that we can use the oracle estimator and compare its performance with the simulated based estimator. Their performance is comparable and also very close to the MLE, so in this model there is no need to use control variates. The second example is a nonlinear model for time series of counts, which has been proposed originally in Zeger (1988) and applied, for instance, for modeling disease counts (see also Campbell (1994), Chan and Ledolter (1995) and Davis, Dunsmuir, and Wang (1999)).

In the second example, the oracle estimator does not apply, since the chf of a Poisson-AR process cannot be computed in closed form. For this model and different parameter sets, both the simulation based and the control variates based estimators perform satisfactorily, and the control variates based estimator improves the performance of the simulation based estimator considerably. When compared with the composite pairwise likelihood estimator in Davis and Yau (2011), the control variates based estimator has comparable or even smaller bias.

Our paper is organized as follows. In Section 2 we present the oracle estimator, and the estimators computed from a Monte Carlo approximation and from a control variates approximation of the chf in detail. Here we also motivate the choice of the control variates used. The asymptotic properties of the two new estimators are established in Section 3. As all estimators are computed from true or approximated chf’s we assess their performance in Section 4, first for a Gaussian AR(1) process and then for the Poisson-AR process. Practical aspects of calculating the weighted least squares function are discussed in Section 5, as well as the estimation results for finite samples. In Section 5.1 we compare the oracle estimator, the simulation based parameter estimator and the MLE for a Gaussian ARFIMA model, whereas in Section 5.2 we compare the simulation based parameter estimator and the control variates based estimator for the Poisson-AR process. The proofs of the main results in Section 3, of Lemma 1 of Section 5, and the Tables discussed in Sections 5.1 and 5.2 are provided in the Appendix.

2 Parameter estimation based on the empirical characteristic function

Throughout we use the following notation. For $z\in{\mathbb{C}}$ we use the $L^{2}$ -norm: $|z|=\sqrt{z\,\overline{z}}$ , where $\overline{z}$ is the complex conjugate of $z$ . For $x\in{\mathbb{R}}^{d}$ and $d\in{\mathbb{N}}$ we denote by $|x|$ the $L^{2}$ -norm, but recall that in ${\mathbb{R}}^{d}$ all norms are equivalent. For $z\in{\mathbb{C}}$ the symbols $\Re(z)$ and $\Im(z)$ denote its real and imaginary part. For a function $f:{\mathbb{R}}^{q}\to{\mathbb{R}}^{p}$ its Jacobi matrix is given by $\nabla_{\theta}f(\theta)=\frac{\partial f(\theta)}{\partial\theta^{T}}\in{\mathbb{R}}^{p\times q}$ and $\nabla_{\theta}^{2}f(\theta)=\frac{\partial\text{vec}(\nabla_{\theta}f(\theta))}{\partial\theta^{T}}\in{\mathbb{R}}^{pq\times q}$ .

2.1 The oracle estimator

Let $(X_{j}(\theta))_{j\in{\mathbb{Z}}}$ be a stationary time series process, whose distribution depends on $\theta\in\Theta\subset{\mathbb{R}}^{q}$ for some $q\in{\mathbb{N}}$ . Denote by $\theta_{0}\in\Theta$ the true parameter, which we want to estimate, and suppose that we observe $X_{1},\dots,X_{T}$ . Given a fixed $p\in{\mathbb{N}}$ , define for $\theta\in\Theta$ the $p$ -dimensional blocks

[TABLE]

where $n=T-p+1$ . For $j=1,\dots,n,$ the observed blocks correspond to ${\bm{X}}_{j}=(X_{j},\dots,X_{j+p-1})$ , which can be used to calculate the empirical characteristic function (chf), defined as

[TABLE]

Under mild conditions such as ergodicity, $\varphi_{n}(t)$ converges a.s. pointwise to the true chf $\varphi(t)={\mathbb{E}}e^{i\langle t,{\bm{X}}_{1}\rangle}$ for all $t\in{\mathbb{R}}^{p}$ . We assume that $p$ is chosen in such a way that $\varphi(\cdot)$ uniquely identifies the parameter of interest $\theta$ . The idea of estimating $\theta_{0}$ from a single time series observation by matching the empirical chf of blocks of the observed time series and the true one has been proposed in Yu (1998) and Knight and Yu (2002), and we use the one in Knight and Yu (2002), where the oracle estimator of $\theta_{0}$ is defined as

[TABLE]

where

[TABLE]

with suitable weight function $w$ such that the integral is well-defined, and chf

[TABLE]

In an ideal situation, $\varphi(\cdot,\theta)$ has an explicit expression, which is known for all $\theta\in\Theta$ .

2.2 Estimator based on a Monte Carlo approximation of $\varphi(\cdot,\theta)$

Unfortunately, a closed form expression of the chf $\varphi(\cdot,\theta)$ is for many time series processes not available. However, it can be approximated by a Monte Carlo simulation, and an idea borrowed from the simulated method of moments (McFadden (1989), see also Smith (1993) and Gourieroux, Monfort, and Renault (1993) for a similar idea in the context of indirect inference) is to replace $\varphi(\cdot,\theta)$ by its functional approximation constructed from simulated sample paths of $(X_{j}(\theta))_{j\in{\mathbb{Z}}}$ . For many different $\theta\in\Theta$ , we simulate, independent of the observed time series, an iid sample of the blocks in (2.1) denoted by

[TABLE]

for $H\in{\mathbb{N}}$ , and define the Monte Carlo approximation of $\varphi(\cdot,\theta)$ based on these simulations as

[TABLE]

If we replace $\varphi(\cdot,\theta)$ in (2.4) by $\varphi_{H}(\cdot,\theta)$ , we obtain the simulation based parameter estimator

[TABLE]

where

[TABLE]

with suitable weight function $w$ such that the integral is well-defined.

Remark 2.1.

An alternative approximation to (2.7) of the chf is based on generating one long time series path and use the empirical chf of the consecutive blocks of $p$ -dimensional random variables constructed as in (2.1) (see Carrasco et al. (2007)). While being unbiased, the approximation will generally have larger variance than the approximation (2.7). Nevertheless, when it is expensive to generate realizations even of dimension $p$ , for instance, when a long burn-in time is required to achieve stationarity, it may be computationally more efficient to generate one long time series. While we do not pursue this approach here, the technical aspects of working with one long time series are not much different than the estimate based on independent replicates as in (2.7), but might require a much larger sample size than desired to control the variance of the estimate. This is especially true for long-memory time series.

Since $\varphi_{H}(\cdot,\theta)$ is based on $H$ iid time series blocks, we can reduce its variance further using control variates to produce an even more accurate approximation for the chf. This will result in an improved version of $\hat{\theta}_{n,H}$ .

2.3 Estimator based on a control variates approximation of $\varphi(\cdot,\theta)$

The estimator $\hat{\theta}_{n,H}$ in (2.8) requires only that the stationary time series process can be simulated, and is therefore easily applicable to a large class of models. When computing $Q_{n,H}(\theta)$ of (2.9), it is very important that the error

[TABLE]

in approximating the true chf is small, since it propagates to $\hat{\theta}_{n,H}$ . In order to reduce the variance of the empirical chf $\varphi_{H}(\cdot,\theta)$ , we use the method of control variates, an often used variance reduction technique in the context of Monte Carlo integration (Glynn and Szechtman (2002), Oates, Girolami, and Chopin (2017), Portier and Segers (2019)).

We construct a control variates approximation of $\varphi(\cdot,\theta)$ from the iid sample $\tilde{\bm{X}}_{j}(\theta)$ , $j=1,\dots,H$ , as in (2.6). We also require explicit expressions for the moments ${\mathbb{E}}\langle t,{\bm{X}}_{1}(\theta)\rangle^{\nu}$ for $\nu=1,2$ and $\theta\in\Theta$ .

Recall that $\tilde{\bm{X}}_{1}(\theta)\stackrel{{\scriptstyle\mathrm{d}}}{{=}}{\bm{X}}_{1}(\theta)$ for all $\theta\in\Theta$ , so that both random variables have the same moments. As in Portier and Segers (2019), we denote by $P_{\theta}$ the distribution of the block ${\bm{X}}_{1}(\theta)$ and by $P_{H,\theta}$ its empirical version. For example, if $f_{t}(x)=e^{i\langle t,x\rangle}$ for $t,x\in{\mathbb{R}}^{p}$ , we want to provide a good approximation for $\varphi(t,\theta)={\mathbb{E}}f_{t}({\bm{X}}_{1}(\theta))=:P_{\theta}(f_{t})$ for $\theta\in\Theta$ . To apply the control variates technique, we need control functions, which are correlated with $f_{t}({\bm{X}}_{1}(\theta))$ and whose expectations are known. In the time series context, it is often that we know the first and second order structure of the process in closed form. Even for complicated models, e.g., models defined in terms of stochastic integrals (see e.g. Brockwell (2001); Klüppelberg et al. (2004); Brockwell et al. (2006); Stelzer (2010)) these expressions are available. The first and second order of ${\bm{X}}_{1}(\theta)$ appear in the Taylor series of $f_{t}({\bm{X}}_{1}(\theta))$ and therefore they are natural choices of control functions. We also remark that if the time series process also allows for the computation of additional moments expressions in closed form, which are correlated with $f_{t}({\bm{X}}_{1}(\theta))$ , then we encourage using them as control functions while approximating the chf. We describe now the construction of the control variates approximation in detail.

We use the first two terms in the Taylor series of the complex function $f_{t}(x)$ , which suggests the vector of control functions $h_{t,\theta}=(h_{1,t,\theta},h_{2,t,\theta})^{T}$ , where for $\nu=1,2$ ,

[TABLE]

so that $P_{\theta}(h_{t,\theta})=0$ , the zero vector in ${\mathbb{R}}^{2}$ . The Monte Carlo approximation of $\varphi(\cdot,\theta)$ based on the iid sample $\tilde{\bm{X}}_{j}(\theta)$ , $j=1,\dots,H$ , is then

[TABLE]

Since ${\mathbb{E}}P_{H,\theta}(f_{t})={\mathbb{E}}f_{t}({\bm{X}}_{1}(\theta))$ , the Monte Carlo approximation $\varphi_{H}(t,\theta)$ is unbiased and has variance

[TABLE]

Then for every vector $\beta\in{\mathbb{C}}^{2}$ , we have that $P_{H,\theta}(f_{t})-\beta^{T}P_{H,\theta}(h_{t,\theta})$ is also an unbiased estimator of $\varphi(t,\theta)$ . Since $\tilde{\bm{X}}_{j}(\theta)$ , $j=1,\dots,H$ , is an independent sample, $\mathbb{V}{\rm ar}[P_{H,\theta}(f_{t})-\beta^{T}P_{H,\theta}(h_{t,\theta})]=H^{-1}\sigma_{\theta}^{2}(f_{t}-\beta^{T}h_{t,\theta})$ and, if we differentiate the map $\beta\mapsto\sigma_{\theta}^{2}(f_{t}-\beta^{T}h_{t,\theta})$ with respect to $\beta$ and set it equal to zero, we obtain (cf. Approach 1 in Glynn and Szechtman (2002)) the theoretical optimum

[TABLE]

provided the inverse exists. In this case, the estimator

[TABLE]

has minimal asymptotic variance. In order to investigate the existence of the above inverse note that for each fixed $t\in{\mathbb{R}}^{p}$ and $\theta\in\Theta$ , the determinant of $P_{\theta}(h_{t,\theta}h_{t,\theta}^{T})$ is

[TABLE]

Since by the Cauchy-Schwarz inequality,

[TABLE]

it follows (see e.g. Klenke (2013), Theorem 5.8) that

[TABLE]

for some $a,b,c\in{\mathbb{R}}$ with $|a|+|b|+|c|>0$ . As the scalar product is random, universal coefficients to satisfy the right-hand side of (2.15) exist only in degenerate cases, which we do not consider.

Since $\beta^{(\text{opt})}_{\theta,f_{t}}(h_{t,\theta})$ is unknown, it needs to be estimated (e.g. by one of the methods in Glynn and Szechtman (2002), and we use the one described in eqs. (6) and (7) in Portier and Segers (2019)):

[TABLE]

For the iid sample $\tilde{\bm{X}}_{j}(\theta),j=1,\dots,H$ , as in (2.6) we obtain the control variates approximation of $\varphi(\cdot,\theta)$ given by

[TABLE]

where

[TABLE]

Recall from (2.11) that $P_{H,\theta}(f_{t})=\varphi_{H}(t,\theta)$ , so we could simply replace $\varphi_{H}(t,\theta)$ in (2.9) by $\varphi^{\text{(cv)}}_{H}(t,\theta)$ as given in (2.17). However, as we shall see in Section 4, the control variates approximation $\varphi^{\text{(cv)}}_{H}(t,\theta)$ provides superior approximations of $\varphi(t,\theta)$ only for values of $t$ , for which $\mathbb{V}{\rm ar}(\langle t,\tilde{\bm{X}}_{1}(\theta)\rangle)$ is small. Thus, we replace $\varphi_{H}(t,\theta)$ in (2.9) by a combination of $\varphi_{H}(t,\theta)$ and $\varphi^{\text{(cv)}}_{H}(t,\theta)$ . More precisely, we propose the following control variates based estimator:

[TABLE]

where for appropriate $k>0$ ,

[TABLE]

$\bar{w}(t)=\frac{w(t)}{\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)}$ , with suitable weight function $w$ such that the integral is well-defined.

It is worth mentioning that, for a fixed weight function $w(\cdot)$ , the weight function $\bar{w}(\cdot)$ can always be computed since $\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)$ depends only on the time series data. The downside of using the control variates based estimator (2.19) is that one needs to resort to numerical integration. However, the procedure is feasible for moderate dimension $p$ . As illustrated in the Poisson-AR example of Section 4.2, the control variates based estimator has improved the performance over the simulation based estimator (2.8) considerably.

Note that $\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)=t^{T}\hat{\Gamma}_{p}t$ where $\hat{\Gamma}_{p}=(\hat{\gamma}_{p}(i-j))_{i,j=1}^{p}$ with

[TABLE]

and $\hat{\mu}_{n}=\frac{1}{n}\sum_{j=1}^{n}X_{j}$ . The choice of the indicator function $1_{\{\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)<k\}}$ is justified by the fact that, when estimating the parameter $\theta_{0}$ , we focus on approximations of $\varphi(t,\theta)$ for $\theta$ close to $\theta_{0}$ .

3 Asymptotic behavior of the parameter estimators

Before performing the parameter estimation we need to make sure that the parameters are identifiable from the model.

In the following we assume that the model parameters are identifiable from the chf. In our examples, the dimension $p$ must be at least 2. For a specific choice of $p$ , the minimum in (2.19) may not be unique giving an identifiability problem of the estimated model. This may be remedied by increasing the dimension $p$ .

In the sequel, we will make various assumptions on different aspects of the underlying process, smoothness of the model, moments of the process, and properties of the weight function. We group these assumptions into the following categories.

Assumptions A (Parameter space and time series process).

**

$(a.1)$

$\Theta$ * is a compact subset of ${\mathbb{R}}^{q}$ and $\theta_{0}\in\Theta^{\mathrm{o}}$ , the interior of $\Theta$ .* 2. $(a.2)$

$(X_{j})_{j\in{\mathbb{Z}}}$ * is a stationary and ergodic sequence.* 3. $(a.3)$

$(X_{j})_{j\in{\mathbb{Z}}}$ * is $\alpha$ -mixing with rate function $(\alpha_{j})_{j\in{\mathbb{N}}}$ satisfying $\sum_{j=1}^{\infty}(\alpha_{j})^{1/r}<\infty$ for some $r>1$ .*

Assumptions B (Continuity and differentiability in $\theta$ ).

**

$(b.1)$

For each $j\in{{\mathbb{N}}}$ , the map $\theta\mapsto{\tilde{\bm{X}}}_{j}(\theta)$ is continuous on $\Theta$ . 2. $(b.2)$

For each $j\in{{\mathbb{N}}}$ , the map $\theta\mapsto{\tilde{\bm{X}}}_{j}(\theta)$ is twice continuously differentiable in an open neighborhood around $\theta_{0}$ .

Assumptions C (Moments).

**

$(c.1)$

${\mathbb{E}}|X_{1}|^{u}<\infty$ , where $u=2r/(r-1)$ with $r>1$ being such that $(a.3)$ holds. 2. $(c.2)$

${\mathbb{E}}\prod_{j=1}^{p}|X_{j}|^{\alpha}<\infty$ * for some $\alpha\in(u/2,u]$ where $u=2r/(r-1)$ with $r>1$ being such that $(a.3)$ holds.* 3. $(c.3)$

${\mathbb{E}}\sup_{\theta\in\Theta}|X_{1}(\theta)|^{4}<\infty$ . 4. $(c.4)$

For each $\theta\in\Theta$ , ${\mathbb{E}}|\nabla_{\theta}X_{1}(\theta)|<\infty$ . 5. $(c.5)$

${\mathbb{E}}\sup_{\theta\in\Theta}|\nabla_{\theta}X_{1}(\theta)|^{2(1+\varepsilon)}<\infty$ * and ${\mathbb{E}}\sup_{\theta\in\Theta}|\nabla_{\theta}^{2}X_{1}(\theta)|^{1+\varepsilon}<\infty$ for some $\varepsilon>0$ .*

Assumptions D (Weight function).

**

$(d.1)$

$\int_{{\mathbb{R}}^{p}}w(t){\rm d}t<\infty$ . 2. $(d.2)$

$\int_{{\mathbb{R}}^{p}}|t|w(t){\rm d}t<\infty$ . 3. $(d.3)$

$\int_{{\mathbb{R}}^{p}}|t|^{2(1+\varepsilon)}w(t){\rm d}t<\infty$ * for some $\varepsilon>0$ .* 4. $(d.4)$

$\int_{{\mathbb{R}}^{p}}\frac{w(t)}{|t|^{2}}{\rm d}t<\infty$ .

Assumption B is indeed satisfied by many linear and non-linear time series processes, in particular, when they have a representation $X_{j}(\theta)=f(Z_{j},Z_{j-1},\cdots;\theta)$ or

$X_{j}(\theta)=f(Z_{j},X_{j-1}(\theta),X_{j-2}(\theta),\cdots;\theta)$ for iid noise variables $(Z_{j})_{j\in{\mathbb{Z}}}$ , and $f:{\mathbb{R}}^{\infty}\times\Theta\mapsto{\mathbb{R}}$ is a measurable function. Prominent examples are the MA $(\infty)$ and AR $(\infty)$ representations of a causal or invertible ARMA $(p,q)$ model (see e.g. eqs. (3.1.15) and (3.1.18) in Brockwell and Davis (2013)) or the ARCH $(\infty)$ representation of a GARCH $(p,q)$ model (see e.g. Francq and Zakoïan (2011), Theorem 2.8). In this case, assumptions $(b.1)$ and $(b.2)$ will hold whenever the map $f$ is continuously differentiable for $\theta\in\Theta$ . For example, if $f$ is Lipschitz-continuous for $\theta\in\Theta$ , then the continuity assumption $(b.1)$ holds.

The key asymptotic properties, consistency and asymptotic normality of our estimates are stated in the following theorems. The proofs of these results are presented in the Appendix.

We formulate first the strong consistency results of the parameters.

Theorem 3.1 (Consistency of $\hat{\theta}_{n,H}$ ).

Assume that $(a.1)$ , $(a.2)$ , $(b.1)$ , and $(d.1)$ hold. Let $H=H(n)\rightarrow\infty$ as ${n\to\infty}$ . Then $\hat{\theta}_{n,H}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\theta_{0}$ as ${n\to\infty}.$

Theorem 3.2 (Consistency of $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ ).

Assume that the conditions of Theorem 3.1 hold, and additionally $(c.1)$ , $(c.3)$ , and $(d.4)$ . Let $H=H(n)\rightarrow\infty$ as ${n\to\infty}$ . Then $\hat{\theta}_{n,H,k}^{\text{(cv)}}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\theta_{0}$ as ${n\to\infty}.$

The asymptotic normality of the simulation based parameter estimator reads as follows.

Theorem 3.3 (Asymptotic normality of $\hat{\theta}_{n,H}$ ).

Assume that Assumptions A and B, and the moment conditions $(c.2)$ , $(c.4)$ , and $(c.5)$ hold. Furthermore, assume that the weight function satisfies $(d.1)$ , $(d.2)$ and $(d.3)$ . Set $H=H(n):=\bar{H}(n)n$ and $\bar{H}(n)\to\infty$ as ${n\to\infty}$ and define

[TABLE]

and

[TABLE]

If $Q$ is a non-singular matrix, then

[TABLE]

where

[TABLE]

Theorem 3.3 shows that $\hat{\theta}_{n,H}$ is asymptotically normal and achieves the same asymptotic efficiency as the oracle estimator from (2.3) (see Theorem 2.1 in Knight and Yu (2002)). Therefore, there cannot be any improvement in the limit law for the asymptotic normality of $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ . However, as we show in Section 4, $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ is based on a better approximation of the chf $\varphi(\cdot,\theta)$ than that used for $\hat{\theta}_{n,H}$ . Thus, the control variates estimator $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ improves the finite sample performance compared to the simulation based estimator $\hat{\theta}_{n,H}$ .

Remark 3.4.

As pointed out in (Knight and Yu, 2002, Remark 2.3), the asymptotic variance of $\hat{\theta}_{n,H}$ in (3.3) can be approximated by replacing $\theta_{0}$ by $\hat{\theta}_{n,H}$ in (3.2) and (3.4) and by replacing the infinite sum in (3.4) by an approximating sum with a kernel and a convenient bandwidth using the methods suggested in Andrews (1991) and Newey and West (1994).

4 Assessing the quality of the estimated chf

In this section we compare the performance of both the Monte Carlo approximation $\varphi_{H}(\cdot,\theta)$ and the control variates approximation $\varphi^{\text{(cv)}}_{H}(\cdot,\theta)$ of the chf as defined in (2.7) and (2.17), respectively. We start with the following comparison of the two chf approximations.

Remark 4.1.

[Comparison of $\varphi^{\text{(cv)}}_{H}(\cdot,\theta)$ and $\varphi_{H}(\cdot,\theta)$ ] Assume that $(c.3)$ holds, and let $\varphi^{\text{(cvopt)}}_{H}$ and $\varphi^{\text{(cv)}}_{H}$ be as defined in (2.14) and (2.17), respectively. We use that $\hat{\beta}_{H,\theta,f_{t}}(h_{t,\theta})\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\beta^{(\text{opt})}_{\theta,f_{t}}(h_{t,\theta})$ as ${n\to\infty}$ with limit given in (2.13). This follows from the representation of $\hat{\beta}_{H,\theta,f_{t}}(h_{t,\theta})$ as

[TABLE]

and the almost sure convergence of both terms. The quantities needed to compute the estimator in (2.16) are, for each $\nu,\kappa=1,2$ :

[TABLE]

Hence, strong consistency of $\hat{\beta}_{H,\theta,f_{t}}(h_{t,\theta})$ follows from the SLLN. This together with $P_{\theta}(h_{t,\theta})=0$ implies by Theorem 1 in Glynn and Szechtman (2002) that, as $H\rightarrow\infty$ ,

[TABLE]

with

[TABLE]

with $\sigma^{2}_{\theta}(\cdot)$ as defined in (2.12). Therefore, $\varphi^{\text{(cv)}}_{H}(\cdot,\theta)$ provides an approximation of the integral $Q_{n}(\theta)$ in (2.4) with smaller variance than $\varphi_{H}(\cdot,\theta)$ . As a consequence, this favors the control variates estimator $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ over the simulation based estimator $\hat{\theta}_{n,H}$ for large sample sizes $n\in{\mathbb{N}}$ .

For all forthcoming examples we choose $p=3$ and $H=3\,000$ . We begin with a stationary Gaussian AR(1) process, where we know the chf $\varphi(\cdot)$ explicitly, and then proceed to the Poisson-AR process, where we approximate the true unknown chf by a precise simulated version.

4.1 The Gaussian AR(1) process

We start with a stationary Gaussian AR(1) process to show how the method of control variates improves the Monte Carlo approximation of its chf. Let $(X_{j}(\theta))_{j\in{\mathbb{Z}}}$ be the AR(1) process

[TABLE]

with parameter space $\Theta$ being a compact subset of $\{\theta=(\phi,\sigma):|\phi|<1,\sigma>0\}$ . Then the true chf of ${\bm{X}}_{1}(\theta)=(X_{1}(\theta),X_{2}(\theta),X_{3}(\theta))$ is given by $\varphi(t,\theta)=e^{-\frac{1}{2}t^{T}\Gamma_{3}(\theta)t}$ for $t\in{\mathbb{R}}^{3},$ where the covariance matrix $\Gamma_{3}(\theta)$ is explicitly known and identifies the parameter $\theta$ uniquely; see e.g. Brockwell and Davis (2013), Example 3.1.2. For a fixed $\theta\in\Theta$ and many $t\in{\mathbb{R}}^{3}$ we compute the absolute errors

[TABLE]

where $\varphi_{H}(\cdot,\theta)$ is the Monte Carlo approximation of the chf of ${\bm{X}}_{1}(\theta)=(X_{1}(\theta),X_{2}(\theta),X_{3}(\theta))$ and $\varphi_{H}^{\text{(cv)}}(\cdot,\theta)$ its control variates approximation. To understand how well we can approximate $\varphi(\cdot,\theta)$ , we plot in Figure 1, $\xi_{H}(t,\theta)$ and $\xi_{H}^{\text{(cv)}}(t,\theta)$ against $\sqrt{\mathbb{V}{\rm ar}[\langle t,{\bm{X}}_{1}(\theta)\rangle]}$ for different parameters $\theta$ . These quantities are computed from an iid sample ${\bm{X}}_{j}(\theta),j=1,\dots,H$ as in (2.6). To simulate iid observations from the model (4.3), we use the fact that the one-dimensional stationary distribution is $X_{1}(\theta)\sim N(0,\sigma^{2}/(1-\phi^{2}))$ , and then use the recursion in (4.3) to simulate $X_{2}(\theta)$ and $X_{3}(\theta)$ . We chose $500$ randomly generated values of $t$ from the $3$ -dimensional Laplace distribution with chf given in (5.2).

It is clear from Figure 1 that both the Monte Carlo and the control variates approximations work better when $\sqrt{\mathbb{V}{\rm ar}[\langle t,{\bm{X}}_{1}(\theta)\rangle]}$ is small, and also that the control variates approximations are best for small values of $\sqrt{\mathbb{V}{\rm ar}[\langle t,{\bm{X}}_{1}(\theta)\rangle]}$ . The superiority of the control variates approximation for all $t$ and all parameter settings is clearly visible, and already expected from Remark 4.1.

4.2 The Poisson-AR model

We consider a nonlinear time series process for time series of counts, which has been proposed originally in Zeger (1988). A prototypical Poisson-AR(1) model suggested in Davis and Rodriguez-Yam (2005) assumes that the observations $(X_{j}(\theta))_{j\in{\mathbb{Z}}}$ are independent and Poisson-distributed with means $e^{\beta+\alpha_{j}(\theta)}$ where the process $(\alpha_{j}(\theta))_{j\in{\mathbb{Z}}}$ is a latent stationary Gaussian AR(1) process, given by the equations

[TABLE]

with parameter space $\Theta$ being a compact subset of $\{\theta=(\beta,\phi,\sigma):|\phi|<1,\beta\in{\mathbb{R}},\sigma>0\}$ . The parameter $\theta$ is uniquely identifiable from the second order structure, which has been computed in Section 2.1 of Davis, Dunsmuir, and Wang (2000).

For this model, the true chf of ${\bm{X}}_{1}(\theta)=(X_{1}(\theta),X_{2}(\theta),X_{3}(\theta))$ cannot be computed in closed form. To mimic the assessment of the errors in eq. (4.4), we simulate $1\,000\,000$ iid observations from ${\bm{X}}_{1}(\theta)$ by first simulating a Gaussian AR(1) process $(\alpha_{1}(\theta),\alpha_{2}(\theta),\alpha_{3}(\theta))$ (as described in Section 4.1) and then simulating independent Poisson random variables with means $e^{\beta+\alpha_{1}(\theta)}$ , $e^{\beta+\alpha_{2}(\theta)}$ and $e^{\beta+\alpha_{3}(\theta)}$ , respectively. From this we compute the empirical characteristic function and take it as $\varphi(\cdot,\theta)$ in the absolute error terms (4.4).

We compare the performance of both the Monte Carlo approximation and the control variates approximation of the chf. Figure 2 presents the results. The plots in Figure 2 are also in favor of the control variates approximation, when compared to the Monte Carlo approximation.

5 Practical aspects and simulation results

Our objective is to obtain a simple expression of the integrated mean squared error $Q_{n,H}(\theta)$ in (2.9), which is needed to compute the estimator in (2.8). For a weight function $w$ in (2.9), we write

[TABLE]

for its Fourier transform. Our preference is on weight functions such that (5.1) is known explicitly.

Example 5.1.

[Weight functions and their characteristic functions]

(i) Laplace: $w$ is a multivariate Laplace density with chf

[TABLE]

(ii) Cauchy: $w$ is a multivariate Cauchy density with chf

[TABLE]

(iii) Gaussian: $w$ is a standard multivariate Gaussian density with chf

[TABLE]

$\Box$

Lemma 5.2.

Let $Q_{n,H}(\theta)$ be as in (2.9) and $w$ a weight function with Fourier transform $\tilde{w}$ . Then

[TABLE]

Formula (5.4) is very useful, since it avoids the computation of a $p$ -dimensional integral. Additionally, since the first double sum on the right-hand side of (5.4) does not depend on the argument $\theta$ , for the optimization it can be ignored.

Remark 5.3.

When evaluating the integrated weighted mean squared errors (2.9), (2.20), or (5.4) in practice, they need to be deterministic functions of $\theta$ . This is enforced by taking a fixed seed for every $j=1,\dots,H$ , when simulating $\tilde{\bm{X}}_{j}(\theta)$ for different values of $\theta\in\Theta$ .

In the following two examples we study the finite sample behavior of the estimators $\hat{\theta}_{n,H}$ and $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ . We begin with a stationary Gaussian ARFIMA model, whose chf is explicitly known so that we can use the oracle estimator from Section 2.1. Afterwards we come back to the Poisson-AR process. We choose $p=3$ , since the 3-dimensional chf contains sufficient information to identify the parameter of interest. We also choose $H=3\,000$ .

5.1 The ARFIMA model

Let $(X_{j}(\theta))_{j\in{\mathbb{Z}}}$ be the stationary Gaussian ARFIMA $(0,d,0)$ model

[TABLE]

where $B$ is the backshift operator, with parameter space $\Theta$ being a compact subset of $\{\theta=(d,\sigma):d\in(-0.5,0.5),\sigma>0\}$ . Then the true chf of ${\bm{X}}_{1}(\theta)=(X_{1}(\theta),X_{2}(\theta),X_{3}(\theta))$ is given by $\varphi(t,\theta)=e^{-\frac{1}{2}t^{T}\Gamma_{3}(\theta)t}$ for $t\in{\mathbb{R}}^{3},\theta\in\Theta,$ where the covariance matrix $\Gamma_{3}(\theta)$ is explicitly known and identifies the parameter $\theta$ uniquely; see e.g. Pipiras and Taqqu (2017), Corollary 2.4.4.

For the long-memory case, for each value of $d\in\{0.05,\dots,0.45\}$ we compare the new estimators with the MLE method as implemented in the R package arfima. Thus, for many $\theta\in\Theta$ , we generate iid Gaussian random vectors with mean zero and covariance $\Gamma_{3}(\theta)$ and use them to construct the simulation based estimator $\hat{\theta}_{n,H}$ .

Since the chf $\varphi(\cdot,\theta)$ is known in closed form, we are able to compute the oracle estimator $\hat{\theta}_{n}$ from (2.4). In order to compute the integral appearing in (2.4) in closed form, we choose the weight function $w(t)=(2\pi)^{-3/2}e^{-\frac{1}{2}t^{T}t},t\in{\mathbb{R}}^{3}$ .

Then the integral in (2.4), which needs to be minimized with respect to the parameter $\theta$ , can be evaluated similarly as in (5.4), giving for the chf being known, that $Q_{n}(\theta)$ can be written as

[TABLE]

We compare in Table 1.1 the performance of the simulation based estimator $\hat{\theta}_{n,H}$ , the oracle estimator $\hat{\theta}_{n}$ in (2.3) based on the minimization of (5.1), and the MLE. We fixed $\sigma=1$ for all simulated sample paths used in the simulation study. For both $\hat{\theta}_{n}$ and $\hat{\theta}_{n,H}$ , we also estimate $\sigma$ but report only the performance for the estimator of $d$ which is the key parameter of interest in long-range dependence models. We notice that $\hat{\theta}_{n,H}$ is comparable to the oracle estimator, so in this model there is no need to use control variates. When comparing both simulation based estimators, the RMSEs are almost the same for all $d\geq 0.20$ . The MLE has a smaller RMSE, but both $\hat{\theta}_{n}$ and $\hat{\theta}_{n,H}$ have a smaller bias than the MLE. In the simulations, the density plots for the estimates of $d$ with $d\in\{0.25,0.3\}$ look reasonably normal. On the other hand, the estimates when $d$ is closer to $0.5$ are rather skewed, which is expected due to the constraint $d<0.5$ . In this case a larger sample is needed in order to obtain more normal looking densities.

Remark 5.4.

We also investigate the feasibility of our new estimation procedures for misspecified models. We take a Gaussian ARFIMA as the true model, but for the data we modify the distribution of its innovations. Specifically, we consider the two cases of ARFIMA models driven by noise with a Laplace distribution and with a Student- $t$ distribution with 6 degrees of freedom. The estimation results under the two misspecification scenarios are shown in Tables 1.2 and 1.3 of the Appendix. The quasi-oracle estimator is based on the Gaussian chf, and the quasi-MLE (QMLE) is found by maximizing the Gaussian likelihood, even though the data are in fact nonGaussian. For both noise distributions, we see very little difference in the performance of the three estimators (QMLE compared with MLE) from the Gaussian ARFIMA scenario in Table 1.1. In particular, our estimator continues to have small bias and RMSE that is comparable to the oracle estimator and only slightly larger than that of the QMLE. Of course, it is known that the QMLE estimators behave asymptotically the same as the MLE when the data is Gaussian.

5.2 The Poisson-AR process

The Poisson-AR model has been defined in Section 4.2. We conduct a simulation experiment in the same setting as in Table 5 in Davis and Rodriguez-Yam (2005) and Table 3 in Davis and Yau (2011). The results are shown in Table 1.4 of the Appendix for $n=400$ and nine different parameter settings, where we also classify the models by the corresponding index of dispersion $D$ of the random variable $e^{\beta+\alpha_{1}}$ , which assumes values in $\{0.1,1,10\}$ as shown in Davis and Rodriguez-Yam (2005).

We compare both the simulation based estimator $\hat{\theta}_{n,H}$ and control variates based estimator $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ . We fix $H=3\,000$ , $p=3$ and the $3$ -dimensional Laplace density as in (5.2) for $w$ . To simulate iid observations of $(X_{1}(\theta),X_{2}(\theta),X_{3}(\theta))$ we proceed as explained in Section 4.2. The simulation based estimator $\hat{\theta}_{n,H}$ in (2.8) is computed via (5.4). Unfortunately, such a formula cannot be obtained for the control variates based estimator $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ , since the introduction of the correction $\kappa_{H}$ in (2.18) introduces additional polynomial terms into $Q_{n,H,k}^{\text{(cv)}}$ in (2.20). Thus, we resort to numerical integration to evaluate $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ .

Our findings are as follows. For $D\in\{1,0.1\}$ , the control variates based estimator $\hat{\theta}^{\text{(cv)}}_{n,H,k}$ for $k=1$ presents smaller bias and RMSE than the simulation based estimator $\hat{\theta}_{n,H}$ in most cases, in all others it is comparable. The smallest RMSE values are shaded in Table 4. Additionally, a significant improvement in the bias for estimating $\phi$ is noticeable for $\theta=(0.373,0.500,0.220)$ and $\theta=(0.373,0.900,0.111)$ . This example shows the advantage of using control variates to improve the estimation of the model parameters. This is not surprising in view of the improved performance of estimating the characteristic function as seen in all three panels of Figure 2.

We compare now the control variates based estimator $\hat{\theta}^{\text{(cv)}}_{n,H,k}$ in Table 2 of the Appendix, with the results for the consecutive pairwise likelihood (CPL) from Table 3 in Davis and Yau (2011), which is referred to as CPL1 in that paper. The bias of $\hat{\theta}^{\text{(cv)}}_{n,H,k}$ is smaller than that of CPL1 for the estimated $\beta$ and $\sigma$ for almost all cases, in all others it is comparable. For $\phi$ the bias of $\hat{\theta}^{\text{(cv)}}_{n,H,k}$ and CPL1 are comparable, except that $\hat{\theta}^{\text{(cv)}}_{n,H,k}$ shows poor performance for estimating $\phi$ for the true parameter $(\beta,\phi,\sigma)=(0.373,0.9,0.111)$ . This is due to the fact that the simulated sample paths contain a large number of zeros, giving very little information for the parameter estimation. The estimated values for $\beta$ look normal for all parameter choices. The sampling distributions of the other parameter estimates look close to normal, except in the boundary. In particular, the density for the estimates of $\phi$ when $\phi=0.9$ or $\sigma\in\{0.22,0.111\}$ and estimates of $\sigma$ when $\sigma\in\{0.22,0.111\}$ show some asymmetry, deviating from normality. This is not unexpected because they are close to the boundary.

Appendix A Appendix

Here we present the proofs of the main Theorems, as well as tables of results on the simulation study. Then, in Section A.1 we provide the proofs of Theorems 3.1, 3.2, and 3.3. Finally, we present in Section A.2 the tables summarizing the finite sample behavior of the simulation based estimators for ARFIMA models driven by noise from Gaussian, Laplace, and Student- $t$ distributions, and the Poisson-AR(1) model discussed in Section 5.

A.1 Proofs of the main results

In the following we define $H=H(n)$ and $\bar{H}=\bar{H}(n)=H(n)/n$ , but omit the argument $n$ for notational simplicity. Throughout the letter $c$ stands for any positive constant independent of the respective argument. Its value may change from line to line, but is not of particular interest. For a matrix with only real eigenvalues $\lambda_{\min}(\cdot)$ denotes the smallest eigenvalue.

We often use the uniform SLLN, which guarantees for a continuous stochastic process $(Z(t))_{t\in{\mathbb{R}}^{p}}$ satisfying ${\mathbb{E}}\sup_{t\in K}|Z(t)|<\infty$ that

$\sup_{t\in K}|Z(t)-{\mathbb{E}}Z(t)|\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}0$ as ${n\to\infty}$ for every compact set $K\subset{\mathbb{R}}^{p}$ . More precisely, we use the SLLN on the separable Banach space $C(K)$ , the space of continuous functions on the compact set $K\subset{\mathbb{R}}^{p}$ , endowed with the sup norm (see e.g. Theorem 16(a) in Ferguson (1996) or Theorem 9.4 in Parthasarathey (1967)).

Proof of Theorem 3.1: Let

[TABLE]

be the candidate limiting function of $Q_{n,H}(\theta)$ . For $\delta>0$ define the set

[TABLE]

Since $|e^{i\langle t,\tilde{\bm{X}}_{1}(\theta)\rangle}|=1$ for all $\theta$ and $t$ , and the random elements $(\tilde{\bm{X}}_{j}(\theta),\theta\in\Theta)_{j=1}^{\infty}$ are iid, the uniform SLLN holds giving

[TABLE]

In particular, for $\theta=\theta_{0}$ we also have

[TABLE]

Applying the inequality $||a|^{2}-|b|^{2}|\leq 2|a-b|$ for $a,b\in{\mathbb{C}},|a|,|b|\leq 1$ gives

[TABLE]

Applying $\sup_{\theta\in\Theta}$ on both sides of (A.4), using (A.2) combined with $(d.1)$ , and taking the limit for $\delta\downarrow 0$ gives

[TABLE]

Now we prove that $Q(\theta)=0$ if and only if $\theta=\theta_{0}$ . Obviously $Q(\theta_{0})=0$ . If $\theta\not=\theta_{0}$ , then the distributions of ${\bm{X}}_{1}$ and $\tilde{\bm{X}}_{1}(\theta)$ are different and thus also their characteristic functions are different. Since characteristic functions are continuous, it follows that they are different at least on an interval with positive Lebesgue measure; hence $Q(\theta)>0$ . Therefore, $Q(\theta)$ is uniquely minimized at $\theta_{0}$ and this fact together with (A.5) gives strong consistency of $\hat{\theta}_{n,H}$ .

Proof of Theorem 3.2: We have that $\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)=t^{T}\hat{\Gamma}_{p}t$ , with $\hat{\Gamma}_{p}$ being the $p$ -dimensional empirical covariance matrix of the observed time series $(X_{1},\dots,X_{T})$ as in (2.21). Let $k>0$ be fixed and

[TABLE]

be the candidate limiting function of $Q^{\text{(cv)}}_{n,H,k}(\theta)$ in (2.20), where $\Gamma_{p}$ is the theoretical $p$ -dimensional covariance matrix of the time series process $(X_{j})_{j\in{\mathbb{Z}}}$ .

Based on the definition of $Q^{\text{(cv)}}_{n,H,k}(\theta)$ in (2.20), we divide the domain of integration in the integrated mean squared error $|Q^{\text{(cv)}}_{n,H,k}(\theta)-Q^{\text{(cv)}}(\theta)|$ into $\{\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)<k\}$ and $\{\widehat{\mathbb{V}{\rm ar}}(\langle t,{\bm{X}}_{1}\rangle)\geq k\}$ , equivalently into $L_{n}=\{t\in{\mathbb{R}}^{p}:t^{T}\hat{\Gamma}_{p}t<k\}$ and its complement $L_{n}^{c}$ .

Recall also (2.17) and (2.18). Using $|e^{ix}|=1$ for all $x\in{\mathbb{R}}$ , together with $|ab-cd|\leq|b||a-c|+|c||b-d|$ for $a,b,c,d\in{\mathbb{C}}$ gives for the integral on $L_{n}^{c}$ :

[TABLE]

By $(a.3)$ and $(c.1)$ it follows from Theorem 3(a) in Section 1.2.2 of Doukhan (1994) that

[TABLE]

Since $\mathbb{V}{\rm ar}(X_{1})>0$ , it follows from (A.7) combined with Proposition 5.1.1 in Brockwell and Davis (2013) that $\det(\Gamma_{p})>0$ , and therefore, the minimum eigenvalue $\lambda_{\min}(\Gamma_{p})$ of $\Gamma_{p}$ is positive. Thus, for all $t\in{\mathbb{R}}^{p}$ ,

[TABLE]

By $(a.2)$ and the ergodic theorem $\hat{\Gamma}_{p}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\Gamma_{p}$ and, since the eigenvalues of a matrix are continuous functions of its entries (cf. Bernstein (2009), Fact 10.11.2), also $\lambda_{\min}(\hat{\Gamma}_{p})\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\lambda_{\min}(\Gamma_{p})>0$ . It follows from (A.8) and from the a.s. convergence of the eigenvalues that there exists $N>0$ such that

[TABLE]

Thus, for $t\in L_{n}^{c}$ we obtain

[TABLE]

This together with (A.10) gives the following upper bound for the right-hand side of (A.6):

[TABLE]

The first integral can be estimated as $|Q_{n,H}(\theta)-Q(\theta)|$ in (A.4) which tends to 0 uniformly for $\theta\in\Theta$ provided that $(d.1)$ holds. Since $\hat{\Gamma}_{p}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\Gamma_{p}$ , also the second integral in (A.11) tends 0 a.s. as $n\to\infty$ .

We turn to the integrated mean squared error $|Q^{\text{(cv)}}_{n,H,k}(\theta)-Q^{\text{(cv)}}(\theta)|$ on $L_{n}$ . Let $L=\{t\in{\mathbb{R}}^{p}:|t|\leq\sqrt{\frac{2k}{\lambda_{\min}(\Gamma_{p})}}\}$ . The control variates correction used in (2.20) can be regarded as a continuous function $g:{\mathbb{R}}^{9}\mapsto{\mathbb{R}}^{2}$ whose entries are the arithmetic means defined in (4.1)-(4.2). By $(c.3)$ and the uniform SLLN, each of these arithmetic means converge a.s. uniformly on $L\times\Theta$ as ${n\to\infty}$ and $H\rightarrow\infty$ . Thus, it follows from the continuity of $g$ and the continuous mapping theorem that

[TABLE]

For $n\geq N$ it follows from (A.9) that $L_{n}\subseteq L$ and thus using the inequality

[TABLE]

valid for $a,b,c,d,e\in{\mathbb{C}}$ with $|d|\leq 2$ gives

[TABLE]

From (A.8), (A.12), (A.2), and (A.3) with $K_{\delta}=L$ for $\delta=\sqrt{2k/\lambda_{\min}(\Gamma_{p})}$ ) ,and $(d.4)$ it follows that $\sup_{\theta\in\Theta}I_{1,n}(\theta)\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}0$ as ${n\to\infty}$ . Finally,

$\sup_{\theta\in\Theta}I_{2,n}(\theta)\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}0$ by similar arguments as used in (A.10) and (A.11), since for $t\in L$ , also applying $(d.4)$ ,

[TABLE]

and

[TABLE]

Proof of Theorem 3.3: By the definition of $\hat{\theta}_{n,H}$ in (2.8) and under assumptions $(a.1)$ and $(b.2)$ we have

[TABLE]

A Taylor expansion of order 1 of $\nabla_{\theta}Q_{n,H}$ around $\theta_{0}$ gives

[TABLE]

where $\theta_{n}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\theta_{0}$ as ${n\to\infty}$ . Therefore, asymptotic normality of $\sqrt{n}(\hat{\theta}_{n,H}-\theta_{0})$ will follow by the delta method, if we prove that as ${n\to\infty}$ :

$(1)$

$\sqrt{n}\nabla_{\theta}Q_{n,H}(\theta_{0})$ converges weakly to a multivariate normal random variable, and 2. $(2)$

$\nabla_{\theta}^{2}Q_{n,H}(\theta_{n})$ converges in probability to a non-singular matrix.

We start with the first point and compute the partial derivatives of $Q_{n,H}$ :

[TABLE]

Recall that $\varphi_{n}(t)$ and $\varphi_{H}(t,\theta)$ denote the empirical characteristic functions of the observed blocks $({\bm{X}}_{1},\dots,{\bm{X}}_{n})$ as in (2.2) and of its Monte Carlo approximation $(\tilde{\bm{X}}_{1}(\theta),\dots,\tilde{\bm{X}}_{H}(\theta))$ as in (2.7), respectively. Define the partial derivatives of the real and imaginary part of $\varphi_{H}(t,\theta)$ :

[TABLE]

and summarize them into

[TABLE]

Then consider

[TABLE]

Abbreviate $b_{H}(t):=b_{H}(t,\theta_{0})$ and $\tilde{g}_{H}(t):=\tilde{g}_{H}(t,\theta_{0})$ . Then it follows from (A.13), (A.15) and (A.16) that

[TABLE]

We analyze the asymptotic behavior of the first term in (A.17) in Lemma A.3. More precisely, we show there that $\int_{K_{\delta}}b_{H}(t)g_{n}(t)w(t){\rm d}t$ for $K_{\delta}$ as in (A.1) converge in distribution to a $q$ -dimensional Gaussian vector. Afterwards, Lemmas A.4 and A.5 show that as $\delta\rightarrow\infty$ , componentwise in ${\mathbb{R}}^{q}$ ,

[TABLE]

and

[TABLE]

where $G$ is a zero mean ${\mathbb{R}}^{2}$ -valued Gaussian field. The formula given in (A.17) tells us that the term ${\mathbb{E}}[b_{1}(t,\theta)]$ will appear in the asymptotic covariance formula of the limiting distribution of the estimator. Therefore it is worth writing it in terms of the chf (2.5).

Remark A.1.

For each $i\in\{1,\cdots,q\}$ and $\theta\in\Theta$ , it follows from (A.14) that

[TABLE]

Since both $\sin$ and $\cos$ are bounded by $1$ we can use $(c.5)$ to interchange expectation and differentiation in (A.18). This combined with (A.15) gives

[TABLE]

This remark will be used later in the proof of Theorem 3.3.

We show by a standard Chebyshev argument that the second term in (A.17) converges in probability componentwise to 0 in (A.48). The convergence of the second derivatives $\nabla_{\theta}^{2}Q_{n}(\theta_{n})$ will be the topic of Lemma A.6. For the scalar products above we use the following bounds several times below.

Lemma A.2.

Let $\nu\geq 1$ , $t\in{\mathbb{R}}^{p}$ , $k,i\in\{1,\dots,q\}$ and $j\in{\mathbb{Z}}$ be fixed and assume that $(b.2)$ holds.Then the following bounds hold true.

(a)

If ${\mathbb{E}}|\nabla_{\theta}X_{1}(\theta)|^{\nu}<\infty$ for $\theta\in\Theta$ , then there exists a constant $c>0$ such that

[TABLE]

(b)

If ${\mathbb{E}}|\nabla_{\theta}^{2}X_{1}(\theta)|^{\nu}<\infty$ for $\theta\in\Theta$ , then there exists a constant $c>0$ such that

[TABLE]

The same bounds hold uniformly, taking expectations over $\sup_{\theta\in\Theta}$ or over $\sup_{t\in K}$ for some compact $K\subset{\mathbb{R}}^{p}$ at both sides of (A.20) and (A.21), provided the corresponding expectations exist.

Proof.

(a) Applying the Cauchy-Schwarz inequality for the inner product, the fact that $(\tilde{\bm{X}}_{j}(\theta),\theta\in\Theta)\stackrel{{\scriptstyle\mathrm{d}}}{{=}}(\tilde{\bm{X}}_{1}(\theta),\theta\in\Theta)\stackrel{{\scriptstyle\mathrm{d}}}{{=}}({\bm{X}}_{1}(\theta),\theta\in\Theta)$ , bounding the $L^{2}$ -norm by the $L^{1}$ -norm, employing the inequality $|\sum_{j=1}^{p}\beta_{j}|^{\nu}\leq p^{\nu-1}\sum_{j=1}^{p}|\beta_{j}|^{\nu}$ valid for $\beta_{1},\dots,\beta_{p}\in{\mathbb{R}}$ and $\nu\geq 1$ gives

[TABLE]

Part (b) follows by analogous calculations. ∎

Lemma A.3.

Under assumptions $(a.2)$ , $(b.2)$ , $(a.3)$ , $(c.2)$ and $(c.4)$ we have on the Borel sets of ${\mathbb{R}}^{q}$ ,

[TABLE]

where $G$ is an ${\mathbb{R}}^{2}$ -valued Gaussian field.

Proof.

Under assumptions $(a.3)$ and $(c.2)$ , it follows from Lemma 4.1(2) in Davis et al. (2018) that $\sqrt{n}(\varphi_{n}(\cdot)-\varphi(\cdot,\theta_{0}))$ convergences in distribution on compact subsets of ${\mathbb{R}}^{p}$ to a complex-valued Gaussian field $\tilde{G}$ , equivalently the vector of real and imaginary part converge to a bivariate Gaussian field $G$ . Since the random elements $(\tilde{\bm{X}}_{j}(\theta),\theta\in\Theta)_{j\in{\mathbb{N}}}$ are iid and the partial derivatives exist by $(b.2)$ , also $(\tilde{\bm{X}}_{j}(\theta_{0}),\nabla_{\theta}\tilde{\bm{X}}_{j}(\theta_{0}))_{j\in{\mathbb{N}}}$ are iid. Then it follows from the definitions (A.14), (A.15), and Lemma A.2 with $K=K_{\delta})$ in combination with $(c.4)$ that

[TABLE]

Hence, the uniform SLLN guarantees that

[TABLE]

Slutsky’s theorem gives then $b_{H}(\cdot)\sqrt{n}g_{n}(\cdot,\theta_{0})$ convergences in distribution on compact subsets of ${\mathbb{R}}^{p}$ to ${\mathbb{E}}[b_{1}(\cdot)]G(\cdot)$ as ${n\to\infty}$ . The result in (A.23) follows from the continuity of the integral by another application of the continuous mapping theorem on $C(K_{\delta})$ . ∎

Lemma A.4.

Under assumptions $(b.2)$ , $(c.4)$ and $(d.2)$ we have componentwise in ${\mathbb{R}}^{q}$ ,

[TABLE]

Proof.

Since $b_{H}(\cdot)$ and $g_{n}(\cdot)$ are independent and ${\mathbb{E}}g_{n}(t)=0$ , we have ${\mathbb{E}}[b_{H}(t)g_{n}(t)]=0$ for all $t\in{\mathbb{R}}^{p}$ . An application of the Cauchy-Schwartz inequality for integrals gives

[TABLE]

We first obtain a bound for the product between the first component $g_{n,1}(\cdot)$ of $g_{n}(\cdot)$ and the first component $b_{H,1}^{(i)}(\cdot)$ of $b_{H}^{(i)}(\cdot)$ . Define for $t\in{\mathbb{R}}^{p}$

[TABLE]

Then,

[TABLE]

Under $(a.3)$ it follows from Theorem 3(a) in Section 1.2.2 of Doukhan (1994) that for fixed $t$ ,

[TABLE]

where $u=\frac{2r}{(r-1)}$ and, thus, it follows from the stationarity of $(U_{j}(t))_{j\in{\mathbb{N}}}$ combined with (A.28) and the fact that $|U_{0}(t)|\leq 2$ that

[TABLE]

where the bound is independent of $t$ . Recall that $H=H(n)=\bar{H}(n)n$ . Under $(c.4)$ , it follows from the iid property of $(V_{j}(t))_{j\in{\mathbb{N}}}$

[TABLE]

Using the fact that $\Big{|}\frac{1}{n}\sum_{j=1}^{n}U_{j}(t)\Big{|}\leq 2$ , adding and subtracting ${\mathbb{E}}V_{0}(t)$ with the inequality $|a+b|^{2}\leq 2(|a|^{2}+|b|^{2})$ , and (A.30) gives

[TABLE]

The calculations in (A.29), (A.30), and (A.31) can now be applied to show that for all $n\in{\mathbb{N}}$ ,

[TABLE]

and, thus, it follows from (A.26) together with $(d.1)$ and $(d.3)$ that

[TABLE]

∎

Lemma A.5.

Under assumptions $(b.2)$ , $(d.2)$ and $(c.4)$

[TABLE]

Proof.

It follows from (A.14), (A.15), $(c.4)$ , and (A.24)

${\mathbb{E}}|b_{1}(t)|\leq c|t|{\mathbb{E}}|\nabla_{\theta}X_{1}(\theta_{0})|<\infty.$ Now we find an upper bound for the variance of each component of $G(t)$ for a fixed $t$ . Let $U_{j}(t)$ be as defined at the left-hand side of (A.27) and notice that the first component of $G(t)$ is the distributional limit of $\frac{1}{\sqrt{n}}\sum_{j=1}^{n}U_{j}(t)$ . Since $(U_{j}(t))_{j\in{\mathbb{N}}}$ is $\alpha$ -mixing by $(a.3)$ , we can apply the CLT in Ibragimov and Linnik (1971) (Theorem 18.5.3 with $\delta=2/(r-1)$ ) and find that the variance of the first component of $G(t)$ is given by

[TABLE]

This combined with Theorem 3(a) in Section 1.2.2 of Doukhan (1994) and the fact that ${\mathbb{E}}U_{j}(t)=0$ and $|U_{j}(t)|\leq 2$ for all $j\in{\mathbb{N}}$ gives by $(a.3)$ and (A.28)

[TABLE]

A similar calculation shows that the variance of the second component of $G(t)$ is also bounded by a finite constant, which does not depend on $t$ . Therefore, ${\mathbb{E}}|G(t)|\leq c$ . This combined with (A.24) and assumption $(d.2)$ gives

[TABLE]

Since $L^{1}$ -convergence implies convergence in probability the result follows. ∎

This proves part (1) of the delta method. We now turn to part (2). In order to calculate the second derivatives of $Q_{n,H}(\theta)$ , which exist by $(b.2)$ , we rewrite (A.13) as

[TABLE]

For the second derivatives we calculate for every $i,k\in\{1,\dots,q\}$ ,

[TABLE]

where we summarize all quantities used in the following list:

[TABLE]

Lemma A.6.

If the assumptions $(a.2)$ , $(b.1)$ , $(b.2)$ , $(c.5)$ , $(d.3)$ hold and $(\theta_{n})_{n\in{\mathbb{N}}}\subset\Theta$ satisfying $\theta_{n}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\theta_{0}$ , then for every $k,i\in\{1,\dots,q\}$ , as ${n\to\infty}$

[TABLE]

Proof.

We first prove that as ${n\to\infty}$

[TABLE]

Step 1: Uniform convergence on $\Theta$ : It follows from the iid property of the random elements $(\tilde{\bm{X}}_{j}(\theta),\theta\in\Theta)_{j\in{\mathbb{N}}}$ that the sequence

$(\tilde{\bm{X}}_{j}(\theta),\nabla_{\theta}\tilde{\bm{X}}_{j}(\theta),\nabla_{\theta}^{2}\tilde{\bm{X}}_{j}(\theta),\theta\in\Theta)_{j\in{\mathbb{N}}}$ is iid. Lemma A.2 together with $(c.5)$ gives the uniform bound

[TABLE]

and it follows from the uniform SLLN that for every fixed $t\in{\mathbb{R}}^{p}$

[TABLE]

Similarly,

[TABLE]

Because of $(a.2)$ the ergodic theorem gives

[TABLE]

Therefore, (A.37) combined with (A.38) and the triangle inequality imply

[TABLE]

Step 2: Pointwise convergence of ${i_{n,H}}(t,\theta_{n}){g_{H,k,i}}(t,\theta_{n})$ : The triangle inequality implies

[TABLE]

Since $\theta_{n}\stackrel{{\scriptstyle\rm a.s.}}{{\rightarrow}}\theta_{0}$ and the map $\theta\mapsto{\mathbb{E}}{i_{1,1}}(t,\theta){\mathbb{E}}g_{1,k,i}(t,\theta)$ is continuous in $\Theta$ , (by $(b.2)$ and $(c.5)$ ) it follows that the second term on the right-hand side of (A.40) converges a.s. to zero. Additionally, since the uniform convergences on (A.36) and (A.39) imply the uniform convergence of the product ${i_{n,H}}(t,\theta){g_{H,k,i}}(t,\theta)$ on $\Theta$ it follows that the first term on the right-hand side of (A.40) also converges a.s. to zero.

Step 3: $L^{1}$ -convergence: Since we have already shown a.s. convergence, it follows from Theorems 6.25(iii) and 6.19 in Klenke (2013) (with $H(x)=|x|^{1+\varepsilon}$ ) that $L^{1}$ -convergence follows provided that

[TABLE]

for some $\varepsilon>0$ . Using the fact that $|{i_{n,H}}(t,\theta_{n})|\leq 2$ and the inequality $|\frac{1}{n}\sum_{j=1}^{n}{\beta}_{j}|^{1+\varepsilon}\leq\frac{1}{n}\sum_{j=1}^{n}|{\beta}_{j}|^{1+\varepsilon}$ , ${\beta_{1},\dots,\beta_{n}\in{\mathbb{R}}}$ , we obtain

[TABLE]

since $|\cos(\cdot)|,|\sin(\cdot)|\leq 1$ . Now we use the inequality $|a+b|^{1+\varepsilon}\leq 2^{\varepsilon}(|a|^{1+\varepsilon}+|b|^{1+\varepsilon})$ for $a,b\in{\mathbb{R}}$ , assumption $(c.5)$ for the uniform bound in Lemma A.2 and the fact that the sequence $(\tilde{\bm{X}}_{j}(\theta),\nabla_{\theta}\tilde{\bm{X}}_{j}(\theta),\nabla_{\theta}^{2}\tilde{\bm{X}}_{j}(\theta),\theta\in\Theta)_{j\in{\mathbb{N}}}$ is iid to continue

[TABLE]

Step 4: Convergence of the random integrals: Define the sequence of functions

[TABLE]

and recall that from the $L^{1}$ -convergence showed in Step 3, for every $t\in{\mathbb{R}}^{p}$ we have $v_{n}(t)\rightarrow 0$ as ${n\to\infty}$ . From the definition of the function $v$ in the last line of (A.42) it follows that $\sup_{n\in{\mathbb{N}}}v_{n}(t)\leq 2v(t)$ . Additionally, assumption $(d.3)$ implies that

[TABLE]

Therefore, it follows from Fubini’s Theorem and dominated convergence that

[TABLE]

and therefore the convergence in probability of (A.35) follows from the $L^{1}$ -convergence in (A.43).

The proofs for the other three remaining integrals on the right-hand side of (A.33) follow along the same lines. The result in (A.34) is then a consequence of the fact that for all $t\in{\mathbb{R}}^{p}$ , ${\mathbb{E}}i_{1,1}(t,\theta_{0})={\mathbb{E}}k_{1,1}(t,\theta_{0})=0$ . ∎

Proof of Theorem 3.3: We handle each term in (A.17) separately. As a direct consequence of Theorem 3.1 and Lemmas A.3-A.6,

[TABLE]

where $Q=(Q_{k,i})_{k,i=1}^{q}$ with

[TABLE]

and $G$ being the ${\mathbb{R}}^{2}$ -valued Gaussian field from Lemma A.3. For arbitrary $k,r\in\{1,\dots,q\}$ we have

[TABLE]

Since $(X_{j})_{j\in{\mathbb{N}}}$ is $\alpha$ -mixing by $(a.3)$ , we can apply the CLT in Ibragimov and Linnik (1971) (Theorem 18.5.3 with $\delta=2/(r-1)$ ) and find that

[TABLE]

where

[TABLE]

Substituting (A.46) and (A.47) into (A.45) gives with Fubini’s Theorem

[TABLE]

which combined with Remark A.1 gives (3.4). By the same arguments of interchanging expectation and differentiation from Remark A.1 we obtain

[TABLE]

This together with (A.44) gives

[TABLE]

leading to (3.2).

The second term in (A.17) is, up to a constant,

[TABLE]

It follows from the fact that $(\tilde{\bm{X}}_{j}(\theta_{0}))_{j\in{\mathbb{N}}}\stackrel{{\scriptstyle\mathrm{d}}}{{=}}({\bm{X}}_{j})_{j\in{\mathbb{N}}}$ combined with (A.32) that

[TABLE]

as ${n\to\infty}$ . Thus (3.3) follows from Chebyshev’s inequality.

A.2 Finite sample behavior of the estimators

A.2.1 ARFIMA models driven by noise from Gaussian, Laplace, and Student- $t$ distributions

A.2.2 Poisson-AR model

Data sharing: Data sharing is not applicable to this article as no datasets were analyzed or used in this study.

Acknowledgement

Thiago do Rêgo Sousa gratefully acknowledges support from the National Council for Scientific and Technological Development (CNPq - Brazil) and the TUM Graduate School. He also thanks the Statistics Department at Columbia University for its hospitality during his visit and takes pleasure to thank Viet Son Pham and Thibaut Vatter for helpful discussions. Davis’ research was partially supported by NSF grant DMS 2015379 to Columbia University.

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andrews (1991) D. W. Andrews. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica: Journal of the Econometric Society , pages 817–858, 1991.
2Belomestny et al. (2015) D. Belomestny, F. Comte, V. Genon-Catalot, H. Masuda, and M. Reiß. Estimation for Discretely Observed Lévy Processes . Lévy Matters IV (LN in Mathematics, Vol. 2128). Springer, Cham, 2015.
3Bernstein (2009) D. S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas . Princeton University Press, Princeton, 2009.
4Bianchi and Cleur (1996) C. Bianchi and E. M. Cleur. Indirect estimation of stochastic differential equation models: some computational experiments. Computational Economics , 9(3):257–274, 1996.
5Brockwell et al. (2006) P. Brockwell, E. Chadraa, A. Lindner, et al. Continuous-time garch processes. The Annals of Applied Probability , 16(2):790–826, 2006.
6Brockwell (2001) P. J. Brockwell. Lévy-driven carma processes. Annals of the Institute of Statistical Mathematics , 53(1):113–124, 2001.
7Brockwell and Davis (2013) P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods . Springer, New York, 2013.
8Campbell (1994) M. Campbell. Time series regression for counts: an investigation into the relationship between sudden infant death syndrome and environmental temperature. Journal of the Royal Statistical Society. Series A , pages 191–208, 1994.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Indirect Inference for Time Series Using the Empirical Characteristic Function and Control Variates

Abstract

1 Introduction

2 Parameter estimation based on the empirical characteristic function

2.1 The oracle estimator

2.2 Estimator based on a Monte Carlo approximation of φ(⋅,θ)\varphi(\cdot,\theta)φ(⋅,θ)

Remark 2.1**.**

2.3 Estimator based on a control variates approximation of φ(⋅,θ)\varphi(\cdot,\theta)φ(⋅,θ)

3 Asymptotic behavior of the parameter estimators

Assumptions A** (Parameter space and time series process).**

Assumptions B** (Continuity and differentiability in θ\thetaθ).**

Assumptions C** (Moments).**

Assumptions D** (Weight function).**

Theorem 3.1** (Consistency of θ^n,H\hat{\theta}_{n,H}θ^n,H​).**

Theorem 3.2** (Consistency of θ^n,H,k(cv)\hat{\theta}_{n,H,k}^{\text{(cv)}}θ^n,H,k(cv)​).**

Theorem 3.3** (Asymptotic normality of θ^n,H\hat{\theta}_{n,H}θ^n,H​).**

Remark 3.4**.**

4 Assessing the quality of the estimated chf

Remark 4.1**.**

4.1 The Gaussian AR(1) process

4.2 The Poisson-AR model

5 Practical aspects and simulation results

Example 5.1**.**

Lemma 5.2**.**

Remark 5.3**.**

5.1 The ARFIMA model

Remark 5.4**.**

5.2 The Poisson-AR process

Appendix A Appendix

A.1 Proofs of the main results

Remark A.1**.**

Lemma A.2**.**

Proof.

Lemma A.3**.**

Proof.

Lemma A.4**.**

Proof.

Lemma A.5**.**

Proof.

Lemma A.6**.**

Proof.

A.2 Finite sample behavior of the estimators

A.2.1 ARFIMA models driven by noise from Gaussian, Laplace, and Student-ttt distributions

A.2.2 Poisson-AR model

Acknowledgement

2.2 Estimator based on a Monte Carlo approximation of $\varphi(\cdot,\theta)$

Remark 2.1.

2.3 Estimator based on a control variates approximation of $\varphi(\cdot,\theta)$

Assumptions A (Parameter space and time series process).

Assumptions B (Continuity and differentiability in $\theta$ ).

Assumptions C (Moments).

Assumptions D (Weight function).

Theorem 3.1 (Consistency of $\hat{\theta}_{n,H}$ ).

Theorem 3.2 (Consistency of $\hat{\theta}_{n,H,k}^{\text{(cv)}}$ ).

Theorem 3.3 (Asymptotic normality of $\hat{\theta}_{n,H}$ ).

Remark 3.4.

Remark 4.1.

Example 5.1.

Lemma 5.2.

Remark 5.3.

Remark 5.4.

Remark A.1.

Lemma A.2.

Lemma A.3.

Lemma A.4.

Lemma A.5.

Lemma A.6.

A.2.1 ARFIMA models driven by noise from Gaussian, Laplace, and Student- $t$ distributions