A Sieve-SMM Estimator for Dynamic Models

Jean-Jacques Forneron

arXiv:1902.01456·econ.EM·January 19, 2023

A Sieve-SMM Estimator for Dynamic Models

Jean-Jacques Forneron

PDF

TL;DR

This paper introduces a flexible Sieve-SMM estimator for nonlinear dynamic models that accurately estimates parameters and shock distributions without requiring parametric assumptions, improving robustness against misspecification.

Contribution

It develops a novel Sieve-SMM approach that approximates shock distributions with a Gaussian and tails mixture sieve, extending asymptotic theory to complex models with latent variables.

Findings

01

Estimator achieves consistency, rate of convergence, and asymptotic normality.

02

Application reveals significant reduction in estimated relative risk-aversion.

03

Method improves robustness in asset pricing models.

Abstract

This paper proposes a Sieve Simulated Method of Moments (Sieve-SMM) estimator for the parameters and the distribution of the shocks in nonlinear dynamic models where the likelihood and the moments are not tractable. An important concern with SMM, which matches sample with simulated moments, is that a parametric distribution is required. However, economic quantities that depend on this distribution, such as welfare and asset-prices, can be sensitive to misspecification. The Sieve-SMM estimator addresses this issue by flexibly approximating the distribution of the shocks with a Gaussian and tails mixture sieve. The asymptotic framework provides consistency, rate of convergence and asymptotic normality results, extending existing results to a new framework with more general dynamics and latent variables. An application to asset pricing in a production economy shows a large decline in the…

Figures7

Click any figure to enlarge with its caption.

Tables8

Table 1. Table 1: Models ( 9 )-( 10 ) – Bias, Standard Deviation and Size for ρ y subscript 𝜌 𝑦 \rho_{y}

		Sieve-SMM				Bayesian		GMM
	$k$	2		3		2	3	-
	$S$	1	5	1	5	-	-	-
AR(1)	bias	-0.014	-0.010	-0.017	-0.016	-0.011	-0.010	-0.015
	std	0.082	0.064	0.077	0.062	0.048	0.049	0.056
	size	0.044	0.033	0.026	0.018	0.051	0.054	0.019
SV	bias	-0.003	-0.006	-0.002	-0.006	-	-	-0.006
	std	0.014	0.012	0.015	0.012	-	-	0.027
	size	0.200	0.170	0.190	0.140	-	-	0.060

Table 2. Table 2: Production Economy: Parameter Estimates

$k$	$β$	$γ$	$ψ$	$τ$	$ρ$	$ν_{π}$	$ν_{z, 1}$	$ν_{z, 2}$	$\log (\bar{π})$	${\hat{Q}}_{n}^{S} ({\hat{β}}_{n})$
$1$	0.994	34.629	0.145	1000.000	0.985	-0.990	-0.100	0.001	0.005	3.58
$1$	(0.001)	(15.373)	(0.007)	(3465.036)	(0.004)	(0.051)	(0.012)	(0.011)	(0.003)	3.58
$2$	0.972	20.153	0.192	50.526	0.766	-0.152	-0.127	-0.015	0.007	2.73
$2$	(0.014)	(4.939)	(0.010)	(19.425)	(0.036)	(0.049)	(0.014)	(0.014)	(0.002)	2.73
$3$	0.988	12.754	0.189	54.559	0.774	-0.059	-0.099	-0.010	0.007	2.70
$3$	(0.007)	(3.339)	(0.011)	(18.263)	(0.029)	(0.046)	(0.015)	(0.014)	(0.002)	2.70
$4$	0.992	10.476	0.165	51.906	0.711	0.005	-0.112	-0.004	0.007	2.58
$4$	(0.006)	(2.692)	(0.012)	(13.748)	(0.039)	(0.048)	(0.014)	(0.014)	(0.002)	2.58
$5$	0.991	11.970	0.170	66.961	0.694	0.139	-0.085	-0.016	0.007	2.46
$5$	(0.007)	(3.147)	(0.011)	(24.847)	(0.034)	(0.045)	(0.012)	(0.012)	(0.002)	2.46
lb	0.965	0.5	0.05	0.01	0.2	-0.99	-0.25	-0.25	0.005	-
ub	0.999	70	110	1000	0.995	0.2	0.25	0.25	0.0085	-

Table 3. Table 3: Production Economy: Sample and Simulated Moments

	Average Yield				Standard Deviations						Corr( $Δ c_{t}$ ,Yield)
	$3 m$	$6 m$	$1 y$	$2 y$	$3 m$	$6 m$	$1 y$	$2 y$	$Δ c_{t}$	$Δ i_{t}$	$3 m$	$6 m$	$1 y$	$2 y$
Sample	4.57	4.71	5.03	5.24	3.19	3.18	3.31	3.25	0.57	1.96	0.07	0.07	0.08	0.09
Gaussian	2.80	2.87	2.99	3.05	2.44	2.35	2.31	2.26	0.53	1.66	-0.25	-0.25	-0.25	-0.25
$k = 2$	3.59	3.64	3.73	3.85	1.66	1.43	1.11	0.73	0.83	1.23	-0.11	-0.07	-0.04	-0.02
$k = 3$	3.74	3.76	3.80	3.85	1.96	1.72	1.37	0.92	0.68	1.48	-0.06	-0.03	-0.01	0.01
$k = 4$	3.88	3.88	3.89	3.90	1.90	1.63	1.22	0.76	0.69	1.42	0.01	0.03	0.05	0.07
$k = 5$	3.81	3.83	3.86	3.89	1.94	1.70	1.28	0.79	0.66	1.41	-0.02	0.02	0.04	0.05
$IES = 1.5$	5.08	5.08	5.08	5.08	1.90	1.63	1.23	0.77	0.34	2.04	-0.16	-0.17	-0.15	-0.08

Table 4. Table D4: Estimates, Standard Errors, Confidence Intervals without the Delta-Method

	$1 / {\hat{τ}}_{n}$	$se (1 / {\hat{τ}}_{n})$	95% CI for $τ$	$1 / {\hat{γ}}_{n}$	$se (1 / {\hat{γ}}_{n})$	95% CI for $γ$
$k = 1$	0.001	0.004	$[128.35, + \infty)$	0.029	0.013	$[18.52, 266.65]$
$k = 2$	0.020	0.008	$[28.81, 204.99]$	0.050	0.012	$[13.61, 38.78]$
$k = 3$	0.018	0.006	$[32.95, 158.60]$	0.079	0.021	$[8.43, 26.19]$
$k = 4$	0.019	0.005	$[34.17, 107.94]$	0.096	0.025	$[6.97, 21.11]$
$k = 5$	0.015	0.005	$[38.77, 245.53]$	0.084	0.022	$[7.90, 24.69]$

Table 5. Table F5: Bias, Standard Deviation and Size

		Sieve-SMM				Bayesian		GMM
	$k$	2		3		2	3	-
	$S$	1	5	1	5	-	-	-
AR(1)	bias	-0.023	-0.025	-0.030	-0.032	-0.024	-0.023	-0.031
	std	0.119	0.092	0.112	0.090	0.073	0.072	0.083
	size	0.047	0.037	0.033	0.027	0.055	0.053	0.016

Table 6. Table F6: Bias, Standard Deviations and Size for the SV Model ( 10 )

		$k = 2$					$k = 4$
S		$μ_{y}$	$ρ_{y}$	$ϑ_{y}$	$ρ_{σ}$	$κ_{σ}$	$μ_{y}$	$ρ_{y}$	$ϑ_{y}$	$ρ_{σ}$	$κ_{σ}$
1	bias	0.000	-0.003	0.005	-0.167	-0.092	-0.000	-0.001	0.010	-0.078	-0.105
	std	0.014	0.014	0.082	0.276	0.280	0.012	0.014	0.066	0.182	0.201
	size	0.315	0.200	0.100	0.100	0.155	0.215	0.140	0.060	0.020	0.085
5	bias	0.001	-0.006	0.020	-0.077	-0.133	0.000	-0.006	0.011	-0.051	-0.083
	std	0.009	0.012	0.062	0.216	0.210	0.008	0.012	0.053	0.126	0.138
	size	0.535	0.170	0.090	0.005	0.055	0.335	0.125	0.050	0.000	0.005
20	bias	0.000	-0.005	0.012	-0.041	-0.116	-0.000	-0.005	0.008	-0.020	-0.066
	std	0.008	0.011	0.060	0.177	0.193	0.007	0.011	0.056	0.071	0.113
	size	0.505	0.155	0.045	0.000	0.000	0.425	0.105	0.050	0.000	0.000

Table 7. Table F7: Sensitivity to Estimation Inputs in the AR(1) Model ( 9 )

	Baseline	$L = 2$	$L = 6$	${\underline{σ}}_{k (n)} = 1.4$	${\underline{σ}}_{k (n)} = 2.2$	$B = 100$	$B = 250$
${\hat{ρ}}_{n}$	0.630	0.638	0.623	0.607	0.623	0.629	0.627

Table 8. Table F8: Asset Pricing Model: Estimates of θ 𝜃 \theta

		$β$	$γ$	$ϕ$	$ρ_{z}$	$ρ_{π}$	$\log \bar{π}$
	true	0.99	4	20	0.95	0.9	0.009
skl	mean	0.990	4.893	20.441	0.946	0.894	0.009
	median	0.990	4.174	20.154	0.951	0.897	0.009
	std	0.003	3.470	6.034	0.026	0.025	0.002
	size	0.036	0.216	0.076	0.152	0.536	0.200
$k = 3$	mean	0.990	4.233	20.456	0.950	0.901	0.009
	median	0.990	4.100	20.198	0.950	0.899	0.009
	std	0.002	1.017	2.410	0.006	0.018	0.001
	size	0.112	0.100	0.172	0.084	0.008	0.004
$k = 5$	mean	0.990	4.183	20.351	0.950	0.902	0.009
	median	0.990	4.077	20.203	0.950	0.901	0.009
	std	0.002	0.696	1.540	0.004	0.013	0.001
	size	0.064	0.104	0.176	0.072	0.000	0.000
ub		0.9999	20	60	0.99	0.99	0.02
lb		0.95	0.01	0.01	0.7	0.6	0.003

Equations499

y_{t} = g_{o b s} (y_{t - 1}, x_{t}, θ, f, u_{t})

y_{t} = g_{o b s} (y_{t - 1}, x_{t}, θ, f, u_{t})

u_{t} = g_{l a t e n t} (u_{t - 1}, θ, f, e_{t}), e_{t} \sim f .

r_{t} = const - lo g (E_{t} [exp (- γ e_{t})]) .

r_{t} = const - lo g (E_{t} [exp (- γ e_{t})]) .

y_{t} = ρ y_{t - 1} + e_{t}, e_{t} \sim ii d f,

y_{t} = ρ y_{t - 1} + e_{t}, e_{t} \sim ii d f,

f_{ω, μ, σ} (\cdot) = j = 1 \sum k \frac{ω _{j}}{σ _{j}} ϕ (\frac{\cdot - μ _{j}}{σ _{j}}),

f_{ω, μ, σ} (\cdot) = j = 1 \sum k \frac{ω _{j}}{σ _{j}} ϕ (\frac{\cdot - μ _{j}}{σ _{j}}),

\hat{ψ}_{n} (τ) = \frac{1}{n} t = 1 \sum n e^{i τ^{'} y_{t}}, \hat{ψ}_{n}^{S} (τ, θ, f) = \frac{1}{n S} s = 1 \sum S t = 1 \sum n e^{i τ^{'} y_{t}^{s} (θ, f)},

\hat{ψ}_{n} (τ) = \frac{1}{n} t = 1 \sum n e^{i τ^{'} y_{t}}, \hat{ψ}_{n}^{S} (τ, θ, f) = \frac{1}{n S} s = 1 \sum S t = 1 \sum n e^{i τ^{'} y_{t}^{s} (θ, f)},

\displaystyle\hat{Q}_{n}^{S}(\theta,f)=\int_{\mathbb{R}^{L+1}}\Big{|}\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\theta,f)\Big{|}^{2}\pi(\tau)d\tau,

\displaystyle\hat{Q}_{n}^{S}(\theta,f)=\int_{\mathbb{R}^{L+1}}\Big{|}\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\theta,f)\Big{|}^{2}\pi(\tau)d\tau,

\hat{Q}_{n}^{S} (\hat{β}_{n}) \leq β \in B_{k (n)} in f \hat{Q}_{n}^{S} (β) + O_{p} (\overset{η}{^}_{n}),

\hat{Q}_{n}^{S} (\hat{β}_{n}) \leq β \in B_{k (n)} in f \hat{Q}_{n}^{S} (β) + O_{p} (\overset{η}{^}_{n}),

\hat{ψ}_{n} (τ) - \hat{ψ}_{n}^{S} (τ) = i τ^{'} Σ_{n}^{- 1/2} (\overset{ˉ}{y}_{n}^{S} - y_{n}) + \frac{1}{2 n S} s = 1 \sum S t = 1 \sum n (y_{t}^{s} - y_{n})^{'} Σ_{n}^{- 1/2} τ τ^{'} Σ_{n}^{- 1/2} (y_{t}^{s} - y_{n}) - \frac{1}{2} τ τ^{'} + \dots

\hat{ψ}_{n} (τ) - \hat{ψ}_{n}^{S} (τ) = i τ^{'} Σ_{n}^{- 1/2} (\overset{ˉ}{y}_{n}^{S} - y_{n}) + \frac{1}{2 n S} s = 1 \sum S t = 1 \sum n (y_{t}^{s} - y_{n})^{'} Σ_{n}^{- 1/2} τ τ^{'} Σ_{n}^{- 1/2} (y_{t}^{s} - y_{n}) - \frac{1}{2} τ τ^{'} + \dots

f_{k} (\cdot) = j = 1 \sum k \frac{ω _{j}}{σ _{j}} ϕ (\frac{\cdot - μ _{j}}{σ _{j}}) + \frac{ω _{k + 1}}{σ _{k + 1}} \mathbbm 1_{\cdot \leq μ_{k + 1}} f_{L} (\frac{\cdot - μ _{k + 1}}{σ _{k + 1}}) + \frac{ω _{k + 2}}{σ _{k + 2}} \mathbbm 1_{\cdot \geq μ_{k + 2}} f_{R} (\frac{\cdot - μ _{k + 2}}{σ _{k + 2}}),

f_{k} (\cdot) = j = 1 \sum k \frac{ω _{j}}{σ _{j}} ϕ (\frac{\cdot - μ _{j}}{σ _{j}}) + \frac{ω _{k + 1}}{σ _{k + 1}} \mathbbm 1_{\cdot \leq μ_{k + 1}} f_{L} (\frac{\cdot - μ _{k + 1}}{σ _{k + 1}}) + \frac{ω _{k + 2}}{σ _{k + 2}} \mathbbm 1_{\cdot \geq μ_{k + 2}} f_{R} (\frac{\cdot - μ _{k + 2}}{σ _{k + 2}}),

\left[\mathbb{E}\left(\sup_{\|f_{\omega,\mu,\sigma}-f_{\tilde{\omega},\tilde{\mu},\tilde{\sigma}}\|_{m}\leq\delta}\Big{|}e_{t}^{s}-\tilde{e}_{t}^{s}\Big{|}^{2}\right)\right]^{1/2}\leq C\left(1+\bar{\mu}_{k(n)}+\bar{\sigma}+k(n)\right)\delta^{1/2},

\left[\mathbb{E}\left(\sup_{\|f_{\omega,\mu,\sigma}-f_{\tilde{\omega},\tilde{\mu},\tilde{\sigma}}\|_{m}\leq\delta}\Big{|}e_{t}^{s}-\tilde{e}_{t}^{s}\Big{|}^{2}\right)\right]^{1/2}\leq C\left(1+\bar{\mu}_{k(n)}+\bar{\sigma}+k(n)\right)\delta^{1/2},

Q_{n} (Π_{k (n)} β_{0}) ≍ max (∥ β_{0} - Π_{k (n)} β_{0} ∥_{B}^{2} lo g (∥ β_{0} - Π_{k (n)} β_{0} ∥_{B})^{2}, ∥ β_{0} - Π_{k (n)} β_{0} ∥_{B}^{2 γ^{2}}, 1/ n^{2}),

Q_{n} (Π_{k (n)} β_{0}) ≍ max (∥ β_{0} - Π_{k (n)} β_{0} ∥_{B}^{2} lo g (∥ β_{0} - Π_{k (n)} β_{0} ∥_{B})^{2}, ∥ β_{0} - Π_{k (n)} β_{0} ∥_{B}^{2 γ^{2}}, 1/ n^{2}),

E [∥ β_{1} - β_{2} ∥_{m} \leq δ sup e^{i τ^{'} (y_{t}^{s} (β_{1}), x_{t})} - e^{i τ^{'} (y_{t}^{s} (β_{2}), x_{t})}^{2} π (τ)] \leq \overline{C} max (\frac{δ ^{γ^{2}}}{σ _{k (n)}^{2 γ^{2}}}, [k (n) + \overline{μ}_{k (n)} + \overline{σ}]^{γ} δ^{γ^{2} /2})

E [∥ β_{1} - β_{2} ∥_{m} \leq δ sup e^{i τ^{'} (y_{t}^{s} (β_{1}), x_{t})} - e^{i τ^{'} (y_{t}^{s} (β_{2}), x_{t})}^{2} π (τ)] \leq \overline{C} max (\frac{δ ^{γ^{2}}}{σ _{k (n)}^{2 γ^{2}}}, [k (n) + \overline{μ}_{k (n)} + \overline{σ}]^{γ} δ^{γ^{2} /2})

max (\frac{l o g [ k ( n ) ] ^{4 r / b + 2}}{k ( n ) ^{2 γ^{2} r}}, \frac{k ( n ) ^{4} lo g [ k ( n ) ] ^{4}}{n}, \frac{1}{n ^{2}}) = o (β \in B_{k (n)}, ∥ β - β_{0} ∥_{B} \geq ε in f Q_{n} (β)),

max (\frac{l o g [ k ( n ) ] ^{4 r / b + 2}}{k ( n ) ^{2 γ^{2} r}}, \frac{k ( n ) ^{4} lo g [ k ( n ) ] ^{4}}{n}, \frac{1}{n ^{2}}) = o (β \in B_{k (n)}, ∥ β - β_{0} ∥_{B} \geq ε in f Q_{n} (β)),

∥ \hat{β}_{n} - β_{0} ∥_{B} = o_{p} (1) .

∥ \hat{β}_{n} - β_{0} ∥_{B} = o_{p} (1) .

\|\beta_{1}-\beta_{2}\|_{weak}^{2}=\int\Big{|}\frac{d\mathbb{E}\left(\hat{\psi}^{S}_{n}(\tau,\beta_{0})\right)}{d\beta}[\beta_{1}-\beta_{2}]\Big{|}^{2}\pi(\tau)d\tau

\|\beta_{1}-\beta_{2}\|_{weak}^{2}=\int\Big{|}\frac{d\mathbb{E}\left(\hat{\psi}^{S}_{n}(\tau,\beta_{0})\right)}{d\beta}[\beta_{1}-\beta_{2}]\Big{|}^{2}\pi(\tau)d\tau

∥ \hat{β}_{n} - β_{0} ∥_{w e ak} = O_{p} (max (\frac{lo g [ k ( n ) ] ^{r / b + 1}}{k ( n ) ^{γ^{2} r}}, \frac{k ( n ) ^{2} lo g [ k ( n ) ] ^{2}}{n})) .

∥ \hat{β}_{n} - β_{0} ∥_{w e ak} = O_{p} (max (\frac{lo g [ k ( n ) ] ^{r / b + 1}}{k ( n ) ^{γ^{2} r}}, \frac{k ( n ) ^{2} lo g [ k ( n ) ] ^{2}}{n})) .

∥ \hat{β}_{n} - β_{0} ∥_{B} = O_{p} (\frac{lo g [ k ( n ) ] ^{r / b}}{k ( n ) ^{r}} + τ_{B, n} max (\frac{lo g [ k ( n ) ] ^{r / b + 1}}{k ( n ) ^{γ^{2} r}}, \frac{k ( n ) ^{2} lo g [ k ( n ) ] ^{2}}{n}))

∥ \hat{β}_{n} - β_{0} ∥_{B} = O_{p} (\frac{lo g [ k ( n ) ] ^{r / b}}{k ( n ) ^{r}} + τ_{B, n} max (\frac{lo g [ k ( n ) ] ^{r / b + 1}}{k ( n ) ^{γ^{2} r}}, \frac{k ( n ) ^{2} lo g [ k ( n ) ] ^{2}}{n}))

∥ \hat{β}_{n} - β_{0} ∥_{w e ak} = O_{p} (max (\frac{lo g [ k ( n ) ] ^{r / b + 1}}{k ( n ) ^{γ^{2} r}}, max (\frac{k ( n ) ^{2} lo g [ n ] ^{2}}{n \times S}, \frac{1}{n}))) .

∥ \hat{β}_{n} - β_{0} ∥_{w e ak} = O_{p} (max (\frac{lo g [ k ( n ) ] ^{r / b + 1}}{k ( n ) ^{γ^{2} r}}, max (\frac{k ( n ) ^{2} lo g [ n ] ^{2}}{n \times S}, \frac{1}{n}))) .

r_{n} \times (ϕ (\hat{β}_{n}) - ϕ (β_{0})) \to d N (0, 1), where r_{n} = n / σ_{n}^{*} \to \infty.

r_{n} \times (ϕ (\hat{β}_{n}) - ϕ (β_{0})) \to d N (0, 1), where r_{n} = n / σ_{n}^{*} \to \infty.

D_{n} = real [\int \partial_{θ, ω, μ, σ}^{'} \hat{ψ}_{n}^{S} (τ) \overline{\partial_{θ, ω, μ, σ} \hat{ψ}_{n}^{S} (τ)} π (τ) d τ],

D_{n} = real [\int \partial_{θ, ω, μ, σ}^{'} \hat{ψ}_{n}^{S} (τ) \overline{\partial_{θ, ω, μ, σ} \hat{ψ}_{n}^{S} (τ)} π (τ) d τ],

Z_{t}

Z_{t}

Z_{t, S}

\overset{σ}{^}_{n}^{* 2} = \partial_{θ, ω, μ σ} ϕ (\hat{β}_{n}) D_{n}^{- 1} (V_{1} + V_{1, S}) D_{n}^{- 1} \partial_{θ, ω, μ σ}^{'} ϕ (\hat{β}_{n}) .

\overset{σ}{^}_{n}^{* 2} = \partial_{θ, ω, μ σ} ϕ (\hat{β}_{n}) D_{n}^{- 1} (V_{1} + V_{1, S}) D_{n}^{- 1} \partial_{θ, ω, μ σ}^{'} ϕ (\hat{β}_{n}) .

y_{t}

y_{t}

y_{t}

y_{t}

y_{t}

y_{t}

U_{t} = [C_{t}^{1 - ψ} + β [E_{t} (U_{t + 1}^{1 - γ})]^{\frac{1 - ψ}{1 - γ}}]^{\frac{1}{1 - ψ}},

U_{t} = [C_{t}^{1 - ψ} + β [E_{t} (U_{t + 1}^{1 - γ})]^{\frac{1 - ψ}{1 - γ}}]^{\frac{1}{1 - ψ}},

lo g Z_{t + 1} = λ + lo g Z_{t} + e_{1, t + 1},

lo g Z_{t + 1} = λ + lo g Z_{t} + e_{1, t + 1},

C_{t} + I_{t} + \frac{B _{t + 1}}{P _{t} R _{t}} = r_{t} K_{t} + w_{t} l_{t} + \frac{B _{t}}{P _{t}}, Y_{t} = C_{t} + I_{t} .

C_{t} + I_{t} + \frac{B _{t + 1}}{P _{t} R _{t}} = r_{t} K_{t} + w_{t} l_{t} + \frac{B _{t}}{P _{t}}, Y_{t} = C_{t} + I_{t} .

lo g π_{t + 1} = \overset{π}{ˉ} + ρ (lo g π_{t} - \overset{π}{ˉ}) + e_{2, t + 1} + ν_{π} e_{2, t} + ν_{z, 1} e_{1, t + 1} + ν_{z, 2} e_{1, t},

lo g π_{t + 1} = \overset{π}{ˉ} + ρ (lo g π_{t} - \overset{π}{ˉ}) + e_{2, t + 1} + ν_{π} e_{2, t} + ν_{z, 1} e_{1, t + 1} + ν_{z, 2} e_{1, t},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newfloatcommand

capbtabboxtable[][\FBwidth] \floatsetupheightadjust=object \floatsetup[table]capposition=top \floatsetup[figure]capposition=top \usdate

A Sieve-SMM Estimator for Dynamic Models

Jean-Jacques Forneron Department of Economics, Boston University, 270 Bay State Road, Boston, MA 02215.

Email: [email protected]. This paper is based on the third chapter of my doctoral dissertation at Columbia University. I am indebted to my advisor Serena Ng for her continuous guidance and support. I would like to thank the co-editor and three anonymous referees for insightful and helpful comments. I also greatly benefited from comments and discussions with Jushan Bai, Tim Christensen, Benjamin Connault, Gregory Cox, Iván Fernández-Val, Ron Gallant, Eric Gautier, Hiro Kaido, Dennis Kristensen, Sokbae Lee, Kim Long, Nour Meddahi, José Luis Montiel Olea, Zhongjun Qu, Christoph Rothe, Bernard Salanié and the participants of the Columbia Econometrics Colloquium as well as the participants of the econometrics seminar at Boston University, Chicago Booth, UC Berkeley, Bocconi, Georgetown, UPenn and participants at several conferences. All errors are my own.

Abstract

This paper proposes a Sieve Simulated Method of Moments (Sieve-SMM) estimator for the parameters and the distribution of the shocks in nonlinear dynamic models where the likelihood and the moments are not tractable. An important concern with SMM, which matches sample with simulated moments, is that a parametric distribution is required. However, economic quantities that depend on this distribution, such as welfare and asset-prices, can be sensitive to misspecification. The Sieve-SMM estimator addresses this issue by flexibly approximating the distribution of the shocks with a Gaussian and tails mixture sieve. The asymptotic framework provides consistency, rate of convergence and asymptotic normality results, extending existing results to a new framework with more general dynamics and latent variables. An application to asset pricing in a production economy shows a large decline in the estimates of relative risk-aversion, highlighting the empirical relevance of misspecification bias.

JEL Classification: C14, C15, C32, C33.

Keywords: Simulated Method of Moments, Mixture Sieve, Asset Pricing.

1 Introduction

Complex nonlinear dynamic models with an intractable likelihood or moments are increasingly common in economics. A popular approach to estimating these models is to match informative sample moments with simulated moments from a fully parameterized model using SMM. However, economic models are rarely fully parametric since theory usually provides little guidance on the distribution of the shocks. The Gaussian distribution is often used in applications but in practice, different choices of distribution may have different economic implications; this is illustrated below. Yet to address this issue, results on semiparametric simulation-based estimation are few.

This paper proposes a Sieve Simulated Method of Moments (Sieve-SMM) estimator for both the structural parameters and the distribution of the shocks and explains how to implement it. The dynamic models considered in this paper have the form:

[TABLE]

The observed outcome variable is $y_{t}$ , $x_{t}$ are exogenous regressors and $u_{t}$ is a vector of unobserved latent variables. The unknown parameters include $\theta$ , a finite dimensional vector, and the distribution $f$ of the shocks $e_{t}$ . The functions $g_{obs},g_{latent}$ are known, or can be computed numerically, up to $\theta$ and $f$ . The Sieve-SMM estimator extends the existing Sieve-GMM literature to more general dynamics with latent variables and the literature on sieve simulation-based estimation of some static models.

The estimator in this paper has two main building blocks: the first one is a sample moment function, such as the empirical characteristic function (CF) or the empirical cumulative distribution function (CDF); infinite dimensional moments are needed to identify the infinite dimensional parameters. As in the finite dimensional case, the estimator simply matches the sample moment function with the simulated moment function. To handle this continuum of moment conditions, this paper adopts the objective function of Carrasco and Florens (2000); Carrasco et al. (2007) in a semi-nonparametric setting.

The second building block is to nonparametrically approximate the distribution of the shocks using the method of sieves, as numerical optimization over an infinite dimension space is generally not feasible. Typical sieve bases include polynomials and splines which approximate smooth regression functions. Mixtures are particularly attractive to approximate densities for three reasons: they are computationally cheap to simulate from, they are known to have good approximation properties for smooth densities, and draws from the mixture sieve are shown in this paper to satisfy the $L^{2}$ -smoothness regularity conditions required for the asymptotic results. Restrictions on the number of mixture components, the tails and the smoothness of the true density ensure that the bias is small relative to the variance so that valid inferences can be made in large samples. To handle potentially fat tails, this paper also introduces a Gaussian and tails mixture. The tail densities in the mixture are constructed to be easy to simulate from and also satisfy $L^{2}$ -smoothness properties. The algorithm gives an overview of steps required to compute the estimator, more details are given in Section 2.

To give intuition why the mixture sieve can be useful in economic analyses, consider an endowment economy where consumption growth $\Delta c_{t}=\log(C_{t}/C_{t-1})$ follows a simple AR(1) process with mean zero innovations $e_{t}\sim f$ . The parameters consist of the AR(1) coefficient and $f$ . This simple specification provides a lot of flexibility to model asset prices; for a CRRA utility with risk aversion $\gamma$ , the risk-free rate is:

[TABLE]

The first const term involves the AR(1) parameters, $\Delta c_{t-1}$ , time-preference $\delta$ , and risk-aversion $\gamma$ . The last term is the log of the moment generating function (MGF) for $f$ evalutated at $\gamma$ . Heavy tails imply an infinite MGF so the paper will focus on Gaussian mixtures for which the MGF is finite and asset prices are well defined. Different distributions $f$ are associated with different MGF which, for a given $\gamma$ , leads to different values of the risk-free rate above. Here, flexibly approximating $f$ would allow to better match features of $\Delta c_{t}$ but also of the risk-free rate. In comparison, with Gaussian shocks, the risk-free rate becomes $r_{t}=\text{const}-\gamma^{2}\sigma_{e}^{2}/2$ , which requires a relatively large value $\gamma$ to match the data (Weil, 1989). The empirical applications look at the asset pricing implications of using a flexible mixture specification for $f$ ; first in a very simple endowment model and then in the production economy of Van Binsbergen et al. (2012). Both applications illustrate the discussion above. For the second application in particular, using quarterly US data between 1961 and 2019, baseline Gaussian estimates confirm their conclusion that macro and financial data are difficult to match. Here, with mixture instead of Gaussian shocks, estimates of risk aversion decline from $35$ to $10$ , with a 95% confidence interval of $[5,16]$ .

As usual in the sieve literature, this paper provides a consistency result and derives the rate of convergence of the structural and infinite dimensional parameters, as well as asymptotic normality results for finite dimensional functionals of these parameters. While the results apply to both static and dynamic models alike, two important differences arise in dynamic models compared to the existing literature on sieve estimation: proving uniform convergence of the objective function and controlling the dynamic accumulation of the nonparametric approximation bias.

The first challenge is to establish the rate of convergence of the objective function for dynamic models. To allow for the general dynamics (1)-(2) with latent variables, this paper adapts results from Andrews and Pollard (1994) and Ben Hariz (2005) to construct an inequality for uniformly bounded empirical processes which may be of independent interest. It holds under the geometric ergodicity condition found in Duffie and Singleton (1993). The boundedness condition is satisfied by the CF and the CDF for instance. Also, the inequality implies a larger variance than typically found in the literature for iid or strictly stationary data with limited dependence induced by the moments.

The second challenge is that in the model (1)-(2) the nonparametric bias accumulates dynamically. At each time period the bias appears because draws are taken from a mixture approximation instead of the true $f_{0}$ , this bias is also transmitted from one period to the next since $(y_{t}^{s},u_{t}^{s})$ depends on $(y_{t-1}^{s},u_{t-1}^{s})$ . To ensure that this bias does not accumulate too much, a decay condition is imposed on the data generating process (DGP). For an AR(1) process with coefficient $\rho$ , this condition holds if $|\rho|<1$ . The resulting bias is generally larger than in static models and usual sieve estimation problems. Together, the increased variance and bias imply a slower rate of convergence for the Sieve-SMM estimates. Hence, in order to achieve the rate of convergence required for asymptotic normality, the Sieve-SMM requires additional smoothness of the true density $f_{0}$ . Bias accumulation seems to be generic to sieve estimation of dynamic models: if the computation of the moments or likelihood involves a filtering step then the bias accumulates inside the prediction error of the filtered values. Monte-Carlo simulations illustrate the finite sample properties of the estimator and the effect of dynamics on the bias and the variance properties of the estimator.

While the paper proposes to relax certain parametric assumptions in the estimation, the model can still be misspecified along other dimensions and remain unable to match certain features of the data. This constrasts with more parsimonious choices of moments in indirect inference which could be more robust to misspecification in certain dimensions. This is illustrated in the second application, Table 3, using a sieve improves the fit in some dimensions but not all. To interpret the parameters being estimated here under misspecification, note that the objective function $Q=\lim_{n\to\infty}\hat{Q}_{n}^{S}$ can be re-written using Fubini’s Theorem and direct calculations – under Gaussian weighting with mean [math] and variance $\Sigma$ – as $\int\exp(-\frac{1}{2}[\textbf{y}_{t}-\tilde{\textbf{y}_{t}}]^{\prime}\Sigma[\textbf{y}_{t}-\tilde{\textbf{y}_{t}}])[f(\textbf{y}_{t})-f_{0}(\textbf{y}_{t})][f(\tilde{\textbf{y}_{t}})-f_{0}(\tilde{\textbf{y}_{t}})]d\textbf{y}_{t}d\tilde{\textbf{y}_{t}}$ where $f$ and $f_{0}$ are the distribution of the simulated data $\textbf{y}_{t}=(y_{t},\dots,y_{t-L},x_{t},\dots,x_{t-L})$ and the data, respectively. This implies that the parameters minimize this distance between the joint distributions $f$ and $f_{0}$ weighted by a Gaussian kernel with variance $\Sigma^{-1}$ . Notice that it is well defined even if the two distributions have different supports. In light of this, extending the theory to cover estimation and inference under misspecification, as in Ai and Chen (2007), and understanding the impact of using infinite rather than finite dimensional moments and objective function on the pseudo-true parameters are of interest for future research.

Related Literature

The Sieve-SMM estimator presented in this paper combines two literatures: sieve estimation and the Simulated Method of Moments (SMM). This section provides a non-exhaustive review of the existing methods and results.

A key aspect to simulation-based estimation is the choice of moments $\hat{\psi}_{n}$ . The Simulated Method of Moments (SMM) estimator of McFadden (1989) relies on unconditional moments, the Indirect Inference (IND) estimator of Gouriéroux et al. (1993) uses auxliary parameters from a simpler, tractable model and the Efficient Method of Moments (EMM) of Gallant and Tauchen (1996) uses the score of the auxiliary model. To achieve parametric efficiency, a number of papers consider using nonparametric moments but assume the distribution $f$ is known.111See e.g. Gallant and Tauchen (1996); Fermanian and Salanié (2004). To avoid dealing with nonparametric moments, Carrasco et al. (2007) use the ECF. This paper uses a similar approach in a semi-nonparametric setting.

General asymptotic results are given by Pakes and Pollard (1989) for SMM with iid data and Duffie and Singleton (1993) for time-series. The models considered in this paper are generative, they are fully specified so that one can generate a full dataset by simulation. A related but different class of problems relies on simulations to compute moment conditions for non-linear IV estimation, but these models are not fully parametrically specified. They cannot be used to simulate artificial datasets without additional modelling assumptions.

Few papers are concerned with sieve simulation-based estimation; Bierens and Song (2012) and Newey (2001) consider specific static models. Blasques (2011) considers generic semi-nonparametric indirect inference. The results rely on generic uniform convergence results which do not apply in the present setting because of the non-standard dependence. His assumptions imply $\sqrt{n}$ -convergence of $(\hat{\theta}_{n},\hat{f}_{n})$ which is restrictive. Dridi and Renault (2000) propose a partial encompassing principle where parameters of interest are consistently estimable even if nuisance parameters are inconsistent because of misspecification.

An alternative to using sieves is to model several moments of the distribution with a parametric distribution. Ruge-Murcia (2017) uses the skew Normal and Generalized Extreme Value distributions to model skewness in an asset pricing model. Gospodinov and Ng (2015) use the Generalized Lambda family to estimate a non-invertible Moving Average (MA) model. In applications where the full unknown distribution matters for outcomes, estimates may be sensitive to the choice of distribution. As discussed in the introduction, asset prices depend on the full distribution via the MGF.

Another related literature is the sieve estimation of models defined by moment conditions. These models can be estimated using either Sieve-GMM, Sieve Empirical Likelihood or Sieve Minimum Distance (see Chen, 2007, for a review). Applications include nonparametric estimation of IV and quantile IV regressions, and the semi-nonparametric estimation of asset pricing models,222See e.g. Hansen and Richard (1987); Chen and Ludvigson (2009); Chen et al. (2013); Christensen (2017). for instance. Existing results cover the consistency and the rate of convergence of the estimator as well as asymptotic normality of functional of the parameters for both iid and dependent data. See e.g. Chen and Pouzo (2012, 2015a) and Chen and Liao (2015) for recent results with iid data and dependent data.

In the empirical Sieve-GMM literature, an application closely related to the dynamics encountered in this paper appears in Chen et al. (2013). They estimate an Euler equation with recursive preferences where the value function is approximated using sieves. Norets and Tang (2014) consider semiparametric Gaussian mixture estimation of dynamic discrete choice models. More generally, there is a large literature on Bayesian nonparametric estimation using mixtures. For non-linear state-space models where the likelihood is often intractable, simulations are also used to compute the objective function. This is implemented with the particle filter. Bayesian inference starts with a prior on both the finite dimensional and the non-parametric components, a common choice of prior for mixtures is used in Section 4. Monte-Carlo Markov-Chain methods are then used to sample from the posterior.

To summarize, this paper extends existing results on Sieve and SMM estimation to a framework with non-linear dynamics, latent variables and flexible semi-nonparametric estimation.

Notation

The following notation and assumptions will be used throughout the paper: the parameter of interest is $\beta=(\theta,f)\in\Theta\times\mathcal{F}=\mathcal{B}$ . The finite dimensional parameter space $\Theta$ is compact and the infinite dimensional set of densities $\mathcal{F}$ is possibly non-compact. The sets of mixtures satisfy $\mathcal{B}_{k}\subseteq\mathcal{B}_{k+1}\subseteq\mathcal{B}$ , $k$ is dimension of the sieve set $\mathcal{B}_{k}$ . The dimension $k$ increases with the sample size: $k(n)\to\infty$ as $n\to\infty$ . $\Pi_{k(n)}f$ is the mixture approximation of $f$ . The vector of shocks $e\sim f$ has dimension $d_{e}\geq 1$ . The total variation (TV) distance between two densities is $\|f_{1}-f_{2}\|_{TV}=1/2\int|f_{1}(e)-f_{2}(e)|de$ and the supremum (or sup) norm is $\|f_{1}-f_{2}\|_{\infty}=\sup_{e\in\mathbb{R}^{d_{e}}}|f_{1}(e)-f_{2}(e)|$ . Let $\|\beta_{1}-\beta_{2}\|_{TV}=\|\theta_{1}-\theta_{2}\|+\|f_{1}-f_{2}\|_{TV}$ and $\|\beta_{1}-\beta_{2}\|_{\infty}=\|\theta_{1}-\theta_{2}\|+\|f_{1}-f_{2}\|_{\infty}$ , where $\|\theta\|$ and $\|e\|$ correspond the Euclidian norm of $\theta$ and $e$ respectively. $\|\beta_{1}\|_{m}$ is a norm on the mixture components: $\|\beta_{1}\|_{m}=\|\theta\|+\|(\omega,\mu,\sigma)\|$ where $\|\cdot\|$ is the Euclidian norm and $(\omega,\mu,\sigma)$ are the mixture parameters. For a functional $\phi$ , its pathwise, or Gâteaux, derivative at $\beta_{1}$ in the direction $\beta_{2}$ is $\frac{d\phi(\beta_{1})}{d\beta}[\beta_{2}]=\frac{d\phi\left(\beta_{1}+\varepsilon\beta_{2}\right)}{d\varepsilon}\Big{|}_{\varepsilon=0}$ . For two sequences $a_{n}$ and $b_{n}$ , $a_{n}\asymp b_{n}$ implies that there exists $0<c_{1}\leq c_{2}<\infty$ such that $c_{1}a_{n}\leq b_{n}\leq c_{2}a_{n}$ for all $n\geq 1$ .

Structure of the Paper

The paper is organized as follows: Section 2 provides an overview of the Sieve-SMM estimator and its implementation. Section 3 gives the main asymptotic results. Section 4 illustrates the finite sample properties of the estimator using Bayesian nonparametric estimation as a benchmark. Section 5 applies the methodolgy to asset pricing in a production economy. Section 6 concludes. Appendices A, B consist of preliminary lemmas and the proofs for the main results. The Supplement provides several additional appendices. Appendices A, B and C consist of the proofs for the preliminary lemmas, intermediate results and their proofs. Appendix D provides additional material for the empirical applications and additional results.

2 A Sieve-SMM Estimator

This section describes the estimator and its implementation, including practical aspects such as tuning parameters and optimization, using a simple illustrative AR(1) example:

[TABLE]

where $t=1,\dots,n$ ; $n$ is the sample size. The parameters of interest are $\theta=\rho$ and the distribution $f$ . The latter is approximated by a mixture of $k$ Gaussians densities:

[TABLE]

where $\phi$ is the normal pdf. The weights $\omega_{j}$ are positive and sum to one. The location and scale parameters are also restricted as discussed below. The sieve dimension $k$ increases with $n$ to reduce the approximation bias as sampling uncertainty declines.

Simulation-based estimation requires sampling from $f_{\omega,\mu,\sigma}$ and then generating data from (3). For a given value of the mixture coefficients $(\omega,\mu,\sigma)$ , $S\geq 1$ samples of $n$ Gaussian mixtures are simulated as follows. First, let $\omega_{0}=0$ , compute the cumulative $\overline{\omega}_{j}=\sum_{\ell=0}^{j}\omega_{\ell}$ , draw a uniform and a Gaussian random variable $u_{t}^{s}\overset{iid}{\sim}\mathcal{U}_{[0,1]},Z_{t}^{s}\overset{iid}{\sim}\mathcal{N}(0,1)$ , $t=1,\dots,n$ ; $s=1,\dots,S$ to generate $e_{t}^{s}=\sum_{j=1}^{k}\mathbbm{1}_{u_{t}^{s}\in[\overline{\omega}_{j-1},\overline{\omega}_{j}]}(\mu_{j}+\sigma_{j}Z_{t}^{s})$ . By construction $e_{t}^{s}\overset{iid}{\sim}f_{\omega,\mu,\sigma}$ . The pair $(u_{t}^{s},Z_{t}^{s})$ is only drawn once so that the optimization problem is well behaved and stochastic equicontinuity conditions hold. Then, to simulate from (3), set $y_{0}^{s}=y_{0}$ fixed and compute recursively $y_{t}^{s}=\rho y_{t-1}^{s}+e_{t}^{s}$ for $t=1,\dots,n$ and $s=1,\dots,S$ . In DSGE models, $y_{0}$ is typically set at the steady-state; another common choice is $y_{0}=0$ .

Estimation then requires comparing the sample with the simulated data. In particular, identifying both $\rho$ and $f$ requires information about the persistence of $y_{t}$ and the marginal distribution of $y_{t}-\rho y_{t-1}$ . In (3), all of this information is contained in the joint distribution of $\mathbf{y}_{t}=(y_{t},\dots,y_{t-L})$ for any $L\geq 1$ . Following Carrasco et al. (2007), this joint distribution is summarized by the ECF of the sample and simulated data:

[TABLE]

where $\tau\in\mathbb{R}^{\text{dim}(\mathbf{y})}$ and $i$ is the imaginary number such that $i^{2}=-1$ . In the general setting (1)-(2), the joint ECF of $(\mathbf{y}_{t},\mathbf{x}_{t})$ and $(\mathbf{y}_{t}^{s},\mathbf{x}_{t})$ will be used. Matching the two ECFs over $\tau\in\mathbb{R}^{\text{dim}(\mathbf{y})}$ implies a continuum of moment conditions $\mathbb{E}[\hat{\psi}_{n}(\tau)-\hat{\psi}^{S}_{n}(\tau,\theta,f)]=0,\forall\tau$ . The objective function is computed as a weighted distance of the moment functions (Carrasco and Florens, 2000; Carrasco et al., 2007):

[TABLE]

where $\pi$ is a continuous density with full support. In practice, the Gaussian density is used and the integral is computed over a fine grid as discussed below. The estimated parameter $\hat{\beta}_{n}=(\hat{\rho}_{n},\hat{f}_{n})$ is an approximate minimizer of this weighted distance:

[TABLE]

where $\hat{\eta}_{n}\geq 0$ , $\hat{\eta}_{n}=O_{p}(\eta_{n})$ , $\eta_{n}=o(1)$ corresponds to numerical optimization and integration errors, assumed negligible. The following provides the detailed steps to implement the estimation and suggestions for the tuning parameters.

Inputs for the mixture:

The sieve dimension $k(n)$ and bounds on location/scale parameters $(\mu_{j},\sigma_{j})_{j=1,\dots,k}$ should be chosen jointly. Bounds complying with theoretical requirements are $|\mu_{j}-\mu|\leq\sigma C_{\mu}\log(k)$ and $\sigma_{j}\geq\sigma C_{\sigma}\log(k+1)/(k+1)$ where $\mu=\sum_{j=1}^{k}\omega_{j}\mu_{j}$ and $\sigma^{2}=\sum_{j=1}^{k}\omega_{j}(\mu_{j}^{2}+\sigma_{j}^{2})-\mu^{2}$ are the expected value and variance of the mixture. Both bounds adapt to the density’s mean/variance and are easily handled by the preferred optimizer below. $C_{\mu}$ should be large enough to fit the tails of $f$ , $C_{\mu}=7$ performs well in the simulations and the applications. For a given $k$ , $C_{\sigma}$ plays a similar role to a bandwidth in kernel density estimation. In particular, the local measure of ill-posedness increases with the inverse of the lower bound $\underline{\sigma}_{k(n)}$ on $\sigma_{j}$ which implies a slower rate of convergence for the estimator. The simulations and the application use $C_{\sigma}=1.8$ and vary $k=2,\dots,5$ .

Inputs for $\hat{Q}_{n}^{S}$ :

Three inputs are required: $L,\pi$ and an integration grid. If $y_{t}$ is markovian of order $\ell$ , the first $\ell$ lags contain all of the information on the dependence; a natural choice is then $L=\ell$ (Carrasco et al., 2007). For non-markovian $y_{t}$ , finding $L$ such that the first $L$ (non)linear autocorrelations capture the dependence is necessary. For instance, $L\geq\ell$ for $\text{MA}(\ell)$ models and $L\geq 1$ for a canonical stochastic volatility model with AR(1) volatility. With respect to $\pi$ , using the ECF of $\Sigma_{n}^{-1/2}(\mathbf{y}_{t}-\bar{\mathbf{y}}_{n})$ and $\Sigma_{n}^{-1/2}(\mathbf{y}^{s}_{t}-\bar{\mathbf{y}}_{n})$ , where $\bar{\mathbf{y}}_{n},\Sigma_{n}$ are the sample mean and variance of $\mathbf{y}_{t}$ , with Gaussian density weights corresponds (by a change of variable argument) to a choice of $\pi$ which has appealing features. Indeed, expanding the difference in ECF around $\tau=0$ :

[TABLE]

reveals that a density which puts more weight around [math] gives more weight to lower-order moments. Akin to a GMM weighting scheme, the researcher can put more (or less) weight on lower-order moments (means, co-variances) vs. higher-order moments (skewness, kurtosis) in the estimation by choosing a smaller (or larger) variance for the Gaussian weights. To compute the integral in (4) a finite grid of scrambled Sobol points is used. These can be more accurate than a Monte-Carlo approximation even for relatively large dimensions, unlike quadrature rules. The grid will be assumed to be large enough so that the integration error is negligible. In the second empirical application the integral is computed for $\text{dim}(\mathbf{y}_{t})=28$ based on $7$ variables with $L=3$ lags.

Choice of optimizer:

Numerical optimization is required to find a $\hat{\beta}_{n}$ satisfying (5). Since $\hat{Q}_{n}^{S}$ is typically non-convex, has intractable derivatives and the numbers of coefficients is moderately large (between $11$ and $35$ in the application), a derivative-free global optimizer should be used. The simulations and application rely on particle swarm optimization, a stochastic search algorithm which converges fairly quickly. Matlab’s implementation can evaluate $\hat{Q}_{n}^{S}$ in parallel, speeding up estimation significantly in the application where the policy function is very time-consuming to compute. After terminating the search, run several iterations of a local optimizer to check convergence.

Modelling fat tails:

Gaussian mixtures can only approximate smooth densities $f$ sufficiently fast under a thin tail condition (Kruijer et al., 2010). Similar to Gallant and Nychka (1987), adding a parametric tail component to form a Gaussian and tails mixture allows to model asymmetric excess tail behaviour:

[TABLE]

where $f_{L}(e,\xi_{L})=(2+\xi_{L})\frac{|e|^{1+\xi_{L}}}{[1+|e|^{2+\xi_{L}}]^{2}}$ for $e\leq 0$ and $f_{R}(e,\xi_{R})=(2+\xi_{R})\frac{e^{1+\xi_{R}}}{[1+e^{2+\xi_{R}}]^{2}}$ for $e\geq 0$ are the left and right tail components. They have finite variance if $\xi_{L},\xi_{R}\geq 1$ which allows to prove $L^{2}$ -smoothness of the tail draws. To sample from $f_{L},f_{R}$ , draw $u_{L},u_{R}\sim\mathcal{U}_{[0,1]}$ and compute $Z_{L}=-(1/u_{L}-1)^{\frac{1}{2+\xi_{L}}},Z_{R}=(1/u_{R}-1)^{\frac{1}{2+\xi_{R}}}$ .

3 Asymptotic Properties

3.1 Consistency

Let $Q_{n}(\beta)=\int\big{|}\mathbb{E}\big{(}\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\beta)\big{)}\big{|}^{2}\pi(\tau)d\tau$ be the population analog of the sample objective $\hat{Q}_{n}^{S}$ . The dependence on $n$ arizes from $(y_{t}^{s},x_{t})$ not being covariance stationary since $y_{0}^{s}$ is usually not drawn from the stationary distribution. Since the CF is bounded, under geometric ergodicity, the dominated convergence theorem implies that it has a well-defined limit $Q(\beta)=\int\big{|}\lim_{n\to\infty}\mathbb{E}\big{(}\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\beta)\big{)}\big{|}^{2}\pi(\tau)d\tau.$ For both $Q_{n}$ and $Q$ , expectations are taken over the data $(\mathbf{y}_{t},\mathbf{x}_{t})$ and the simulated $(\mathbf{y}_{t}^{s},\mathbf{x}_{t})$ .

The space of true densities satisfying the assumptions will be denoted as $\mathcal{F}$ and $\mathcal{F}_{k}$ is the corresponding space of Gaussian and tails mixtures $\Pi_{k}f$ .

Assumption 1 (Sieve, Identification, Dependence).

Suppose the following conditions hold. i) Sieve Space: the true density admits the decomposition $f=f_{1}\times\dots\times f_{d_{e}}$ where for each $j=1,\dots,d_{e}$ $f_{j}=(1-\omega_{j,1}-\omega_{j,2})f_{j,S}+\omega_{j,1}f_{L}+\omega_{j,2}f_{R}$ . $f_{j,S}$ is a smooth density with thin tails and the mixture space $\mathcal{F}_{k(n)}$ satisfying the assumptions of Lemma A1 with $k(n)^{4}\log[k(n)]^{4}/n\to 0$ as $k(n)$ and $n\to\infty.$ $\Theta$ is compact and $1\leq\xi_{L},\xi_{R}\leq\overline{\xi}<\infty$ . ii) Identification: $\lim_{n\to\infty}\mathbb{E}\left(\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{s}(\tau,\beta)\right)=0,\pi\text{ a.s. }\Leftrightarrow\|\beta-\beta_{0}\|_{\mathcal{B}}=0$ . $\sup_{\tau}\|\tau\|_{\infty}\pi(\tau)^{1/4}$ is bounded and $\sqrt{\pi}$ is integrable. For any $n,k\geq 1$ and for all $\varepsilon>0$ , $\inf_{\beta\in\mathcal{B}_{k},\,\|\beta-\beta_{0}\|_{\mathcal{B}}\geq\varepsilon}Q_{n}(\beta)$ is strictly positive and weakly decreasing in both $n$ and $k$ . iii) Dependence: $(y_{t},x_{t})$ is strictly stationary and $\beta$ -mixing with exponential decay, the simulated $(y_{t}^{s}(\beta),x_{t})$ are uniformly geometrically ergodic in $\beta\in\mathcal{B}$ .

Condition i. allows to use the approximation results in Kruijer et al. (2010). Here the shocks are independent from one-another, this is a common assumption for structural shocks but could be restrictive in some settings.333The independence condition can be relaxed by using the results in De Jonge and Van Zanten (2010). The requirement on $k(n)$ is stronger than usual. First, the $\log[k(n)]$ term is due to simulating from the mixture. Second, the fourth-power is due to the non-standard dependence. The dependence properties of $y_{t}^{s}$ vary with $\beta$ so that, even though it is strongly mixing, results from Doukhan et al. (1995); Chen and Shen (1998) do not apply. Lemma B11 provides a more conservative bound for the supremum of the empirical process, of order $\sqrt{k(n)^{4}\log[k(n)]^{4}/n}$ compared to $\sqrt{k(n)\log[k(n)]/n}$ with iid or strictly stationary data with fixed dependence.

Condition ii. requires $L$ large enough and $g_{\text{obs}},g_{\text{latent}}$ to uniquely identify $\beta=(\theta,f)$ as discussed for the AR(1) earlier. Condition iii. is common in SMM (Duffie and Singleton, 1993). It implies $(y_{t}^{s},x_{t})$ is strongly-mixing (Liebscher, 2005) and the initial condition bias is negligible, i.e. $Q_{n}(\beta_{0})=O(1/n^{2})$ as shown in Lemma B12.

Further restrictions on the data generating process are required for establishing uniform convergence of the simulated empirical process. Lemma 1 below shows that mixture draws satisfy an $L^{2}$ -smoothness property. Then, using restrictions on the DGP, Lemma 2 extends this property to the moments. Combined with Assumption 1, these allow to derive consistency and the rate of convergence of $\hat{\beta}_{n}$ .

Lemma 1 ( $L^{2}$ -Smoothness of the Mixture Draws).

Let $e_{t}^{s}=\sum_{j=1}^{k(n)}\mathbbm{1}_{\nu^{s}_{t}\in[\sum_{l=0}^{j-1}\omega_{l},\sum_{l=0}^{j}\omega_{l}]}(\mu_{j}+\sigma_{j}Z_{t,j}^{s})$ and $\tilde{e}_{t}^{s}=\sum_{j=1}^{k(n)}\mathbbm{1}_{\nu^{s}_{t}\in[\sum_{l=0}^{j-1}\tilde{\omega}_{l},\sum_{l=0}^{j}\tilde{\omega}_{l}]}(\tilde{\mu}_{j}+\tilde{\sigma}_{j}Z_{t,j}^{s})$ with bounds $|\mu_{j}|,|\tilde{\mu}_{j}|\leq\bar{\mu}_{k(n)}$ and $|\sigma_{j}|,|\tilde{\sigma}_{j}|\leq\bar{\sigma}$ as in Lemma A1. If $\mathbb{E}(|Z_{t,j}^{s}|^{2})\leq C_{Z}^{2}<\infty$ then there exists a finite constant $C$ which only depends on $C_{Z}$ such that:

[TABLE]

where $\|f_{\omega,\mu,\sigma}-f_{\tilde{\omega},\tilde{\mu},\tilde{\sigma}}\|_{m}=\|(\omega,\mu,\sigma)-(\tilde{\omega},\tilde{\mu},\tilde{\sigma})\|_{1}$ .

The $L^{2}$ -smoothness constant depends on the upper bound $\overline{\mu}_{k(n)}=O(\log[k(n)])$ and the sieve dimension $k(n)$ in the pseudo-norm $\|\cdot\|_{m}$ . As shown in Kruijer et al. (2010), $\|\cdot\|_{\text{TV}}$ and $\|\cdot\|_{\infty}$ are bounded above by $\|\cdot\|_{m}$ up to a multiplicative factor which depends on the scales’ lower bound $\underline{\sigma}_{k(n)}$ . This implies $L^{2}$ -smoothness also holds in these norms.

Assumption 2 (Data Generating Process).

$y_{t}^{s}$ * is simulated according to (1)-(2) where $g_{obs}$ and $g_{latent}$ satisfy the following Hölder conditions for some $\gamma\in(0,1]$ :*

y(i).

$\|g_{obs}(y_{1},x,\beta_{1},u)-g_{obs}(y_{2},x,\beta_{1},u)\|\leq C_{1}(x,u)\|y_{1}-y_{2}\|$ ; $\mathbb{E}\left(C_{1}(x_{t},u_{t}^{s})^{2}|y_{t-1}^{s}\right)\leq\bar{C}_{1}<1$ 2. y(ii).

$\|g_{obs}(y,x,\beta_{1},u)-g_{obs}(y,x,\beta_{2},u)\|\leq C_{2}(y,x,u)\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}^{\gamma}$ ; $\mathbb{E}\left(C(y_{t}^{s},x_{t},u_{t}^{s})^{2}\right)\leq\bar{C}_{2}<\infty$ 3. y(iii).

$\|g_{obs}(y,x,\beta_{1},u_{1})-g_{obs}(y,x,\beta_{1},u_{2})\|\leq C_{3}(y,x)\|u_{1}-u_{2}\|^{\gamma}$ ; $\mathbb{E}\left(C_{3}(y_{t}^{s},x_{t})^{2}|u_{t}^{s}\right)\leq\bar{C}_{3}<\infty$ 4. u(i).

$\|g_{latent}(u_{1},\beta_{1},e)-g_{latent}(u_{2},\beta_{1},e)\|\leq C_{4}(e)\|u_{1}-u_{2}\|$ * ; $\mathbb{E}\left(C_{4}(e_{t}^{s})^{2}\right)\leq\bar{C}_{4}<1$ * 5. u(ii).

$\|g_{latent}(u,\beta_{1},e)-g_{latent}(u,\beta_{2},e)\|\leq C_{5}(u,e)\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}^{\gamma}$ * ; $\mathbb{E}\left(C_{5}(u_{t-1}^{s},e_{t}^{s})^{2}\right)\leq\bar{C}_{5}<\infty$ * 6. u(iii).

$\|g_{latent}(u,\beta_{1},e_{1})-g_{latent}(u,\beta_{1},e_{2})\|\leq C_{6}(u)\|e_{1}-e_{2}\|$ * ; $\mathbb{E}\left(C_{6}(u_{t-1}^{s})^{2}\right)\leq\bar{C}_{6}<\infty$ *

for any $(\beta_{1},\beta_{2})\in\mathcal{B}$ , $(y_{1},y_{2})\in\mathbb{R}^{\text{dim}(y)}$ , $(u_{1},u_{2})\in\mathbb{R}^{\text{dim}(u)}$ and $(e_{1},e_{2})\in\mathbb{R}^{\text{dim}(e)}$ . $\|\cdot\|_{\mathcal{B}}$ is either the TV or supremum norm.

Assumption 2 requires a contraction property y(i),u(i) comparable to the $L^{2}$ unit circle condition in Duffie and Singleton (1993). For an AR(1) model (3) this implies $|\rho|\leq\bar{C}_{1}<1$ . While restrictive, it is enforced in SMM and particle-filter likelihood estimation of DSGE non-stationary models via de-trending and pruning of the state-space model. De-trending transforms deterministic or stochastic trends into stationary variables, pruning further guarantees stability by essentially enforcing y(i),u(i). For non-smooth models, Assumption 2*′* in the Supplement substitutes Hölder with $L^{2}$ -smoothness conditions. These assumptions allow to explicitly derive the effect of the approximation bias on $Q_{n}$ , as shown in Lemma A4:

[TABLE]

where $\|\beta_{0}-\Pi_{k(n)}\beta_{0}\|_{\mathcal{B}}^{2}\log\left(\|\beta_{0}-\Pi_{k(n)}\beta_{0}\|_{\mathcal{B}}\right)$ is due to approximation bias and its propagation via y(i),u(i), $1/n^{2}$ is due to nonstationarity, $\|\beta_{0}-\Pi_{k(n)}\beta_{0}\|_{\mathcal{B}}^{2\gamma^{2}}$ comes from the Hölder conditions. The main difference with the literature is the propogation and accumulation of the approximation bias due to the dynamics of the model.

Lemma 2 (Assumption 2/2′ implies $L^{2}$ -Smoothness of the Moments).

Suppose Assumption 2 or 2*′* and the conditions of Lemma 1 hold. If $\|\tau\|_{\infty}\pi(\tau)^{1/4}$ is bounded, then there exists $\overline{C}>0$ such that for all $\delta>0$ , uniformly in $t\geq 1$ , $(\beta_{1},\beta_{2})\in\mathcal{B}_{k(n)}$ and $\tau\in\mathbb{R}^{d_{\tau}}$ :

[TABLE]

where $\|\beta\|_{m}=\|\theta\|+\|(\omega,\mu,\sigma)\|_{1}$ .

The key to establishing $L^{2}$ -smoothness of the continuum of moments with unbounded support is to involve $\pi$ . By Lispschitz-continuity of the sine and cosine functions: $|e^{i\tau^{\prime}(\mathbf{y}^{s}_{t}(\beta_{1}),\mathbf{x}_{t})}-e^{i\tau^{\prime}(\mathbf{y}_{t}^{s}(\beta_{2}),\mathbf{x}_{t})}|\pi(\tau)\leq 2\|\mathbf{y}^{s}_{t}(\beta_{1})-\mathbf{y}^{s}_{t}(\beta_{2})\|\times\|\tau\|_{\infty}\pi(\tau)$ . The simulated data are shown to be $L^{2}$ -smooth under Lemma 1 and Assumption 2 or 2*′*. With $\|\tau\|_{\infty}\pi(\tau)^{1/4}$ bounded this property holds for the ECF, uniformly in $\tau$ . Further, $\sqrt{\pi}$ integrable implies $L^{2}$ -smoothness of the weighted ECF distance used in $\hat{Q}_{n}^{S}$ . These two conditions are also needed to handle the empirical process over the growing sieve space $\mathcal{B}_{k(n)}$ and the unbounded index $\tau$ . Density weights $\pi$ with fat tails, such as the Cauchy density, do not satisfy the last condition.

Theorem 1 (Consistency).

Suppose Assumptions 1 and 2 (or 2*′*) hold, $Q_{n}(\cdot)$ is continuous on $(\mathcal{B}_{k(n)},\|\cdot\|_{\mathcal{B}})$ and the numerical optimization and integration errors are negligible, i.e. $\eta_{n}=o(1/n)$ . If for all $\varepsilon>0$ ,

[TABLE]

where $r$ is the smoothness of the thin-tail component $f_{S}$ and $b$ its exponential tail index, then:

[TABLE]

Theorem 1 is a consequence of the high-level consistency Lemma in Chen and Pouzo (2012). It is not a direct implication of their theorems due to non-standard dependence, continuum of moments, simulation process and bias propagation.

3.2 Rate of Convergence

Following Ai and Chen (2003), the rate of convergence is derived in a weak-norm given below.

Assumption 3 (Weak Norm, Local Properties).

Let $\mathcal{B}_{osn}=\mathcal{B}_{k(n)}\cap\{\|\beta-\beta_{0}\|_{\mathcal{B}}\leq\varepsilon\}$ be a neighborhood of $\beta_{0}$ with $\varepsilon>0$ small. For any $(\beta_{1},\beta_{2})\in\mathcal{B}_{osn}$ :

[TABLE]

is the weak norm of $\beta_{1}-\beta_{2}$ . The derivative $(\beta_{1},\beta_{2})\to\frac{d\mathbb{E}\left(\hat{\psi}^{S}_{n}(\tau,\beta_{1})\right)}{d\beta}[\beta_{2}]$ is continuous in $\beta_{1}$ , linear in $\beta_{2}$ . Suppose there exists $\underline{C}_{w}>0$ such that for all $\beta\in\mathcal{B}_{osn}$ : $\underline{C}_{w}\|\beta-\beta_{0}\|^{2}_{weak}\leq\int\big{|}\mathbb{E}\left(\hat{\psi}^{S}_{n}(\tau,\beta_{0})-\hat{\psi}^{S}_{n}(\tau,\beta)\right)\big{|}^{2}\pi(\tau)d\tau.$

Assumption 3 with the rate $Q_{n}(\Pi_{k(n)}\beta_{0})$ discussed above allows to bound the approximation error $\|\Pi_{k(n)}\beta_{0}-\beta_{0}\|_{weak}$ in the weak norm. Then, standard arguments combined with the results on the empirical process derived for consistency imply the result below.

Theorem 2 (Rate of Convergence).

Suppose that the assumptions for Theorem 1 hold and Assumption 3 also holds.The convergence rate in weak norm is:

[TABLE]

The convergence rate in either the total variation or supremum norm $\|\cdot\|_{\mathcal{B}}$ is:

[TABLE]

where $\tau_{\mathcal{B},n}$ is the local measure of ill-posedness: $\tau_{\mathcal{B},n}=\sup_{\beta\in\mathcal{B}_{osn},\,\|\beta-\Pi_{k(n)}\beta_{0}\|_{weak}\neq 0}\frac{\|\beta-\Pi_{k(n)}\beta_{0}\|_{\mathcal{B}}\,\,}{\quad\|\beta-\Pi_{k(n)}\beta_{0}\|_{weak}}$ .

As usual, the rate of convergence involves a bias/variance trade-off. Here, the bias is inflated because of the dynamics. The variance is larger than usual because of the conservative empirical process bound. This implies slower convergence compared to the iid case.

There are two sources of ill-posedness in this setting. First, the distance between two CFs is weaker than the TV or supremum distance: the CF characterizes convergence in distribution while the other two do not. Second, the problem may be fundamentally ill-posed in which case convergence is necessarily slower in the strong than in the weak norm. Lemma A5 relates the convergence in $\|\cdot\|_{weak}$ to rate in $\|\cdot\|_{m}$ which is useful for proving asymptotic normality. A by-product of this Lemma is a simple bound on $\tau_{n,TV/\infty}$ . From Kruijer et al. (2010), $\|\beta-\Pi_{k(n)}\beta_{0}\|_{TV}\leq\underline{\sigma}_{k(n)}^{-1}\|\beta-\Pi_{k(n)}\beta_{0}\|_{m}$ and $\|\beta-\Pi_{k(n)}\beta_{0}\|_{\infty}\leq\underline{\sigma}_{k(n)}^{-2}\|\beta-\Pi_{k(n)}\beta_{0}\|_{m}$ on $\mathcal{B}_{k(n)}$ . Combined with the Lemma, $\tau_{TV,n}\leq\underline{\lambda}_{n}^{-1/2}\underline{\sigma}_{k(n)}^{-1}$ and $\tau_{\infty,n}\leq\underline{\lambda}_{n}^{-1/2}\underline{\sigma}_{k(n)}^{-2}$ where $\lambda_{n}$ measures local curvature and can be approximated numerically. Note that decreasing $\underline{\sigma}_{k(n)}$ too fast as $k(n)\to\infty$ deteriorates the rate of convergence. For SMM, a larger $S$ implies a smaller asymptotic variance for the estimates. Here, a refinement of the theorem shows that using $S\to\infty$ as $n\to\infty$ can additionally result in faster convergence.

Corollary 1 (Number of Simulated Samples $S$ and Rate of Convergence).

Suppose a long sample $(y_{1}^{s},\dots,y_{nS}^{s})$ can be simulated. Then given $k(n)$ , (7) becomes:

[TABLE]

The fastest possible rate in weak norm is then $\|\hat{\beta}_{n}-\beta_{0}\|_{weak}=O_{p}\left(\max\left(\frac{\log[k(n)]^{r/b+1}}{k(n)^{\gamma^{2}r}},\frac{1}{\sqrt{n}}\right)\right)$ which is attained with $S(n)\asymp k(n)^{4}\log[k(n)]^{4}$ . The fastest rate in TV or supremum norm is then: $\|\hat{\beta}_{n}-\beta_{0}\|_{\mathcal{B}}=O_{p}\left(\frac{\log[k(n)]^{r/b}}{k(n)^{r}}+\tau_{\mathcal{B},n}\max\left(\frac{\log[k(n)]^{r/b+1}}{k(n)^{\gamma^{2}r}},\frac{1}{\sqrt{n}}\right)\right)$ .

Asymptotic normality requires sufficiently fast convergence which usually implies stronger smoothness restrictions on the unknown $f$ . Corollary 1 implies that smoothness requirements can be replaced with the computation requirement of making $S$ large, allowing $k(n)$ to grow more rapidly. Variance reduction techniques are often used in empirical work to reduce simulation noise without taking $S$ large. Whether they could also enhance convergence rates here could be an interesting avenue for research.

3.3 Asymptotic Normality

The following provides asymptotic normality results for plug-in estimates $\phi(\hat{\beta}_{n})$ where $\phi$ are smooth functionals of the parameters. The main steps to derive the results are fairly standard. As in the finite-dimensional case, stochastic equicontinuity results are needed to derive these results which are derived under $\|\cdot\|_{m}$ in Lemmas A5, A6; the natural norm for handling simulation draws. Let $M_{n}=\log\log(n+1)$ , $\delta_{n}$ is the rate of convergence in weak norm above and $\underline{\lambda}_{n}=\lambda_{\min}(\int\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0}))}{d(\theta,\omega,\mu,\sigma)}^{\prime}\overline{\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0}))}{d(\theta,\omega,\mu,\sigma)}}\pi(\tau)d\tau)$ assumed strictly positive.

Definition 1 (Sieve Representer, Score and Variance).

Let $\beta_{0,n}$ be such that $\|\beta_{0,n}-\beta_{0}\|_{weak}=\inf_{\beta\in\mathcal{B}_{osn}}\|\beta-\beta_{0}\|_{weak}$ , let $\overline{V}_{k(n)}$ be the closed span of $\mathcal{B}_{osn}-\{\beta_{0,n}\}$ . The inner product $\langle\cdot,\cdot\rangle$ of $(v_{1},v_{2})\in\overline{V}_{k(n)}$ is defined as: $\langle v_{1},v_{2}\rangle=\frac{1}{2}\int[\psi_{\beta}(\tau,v_{1})\overline{\psi_{\beta}(\tau,v_{2})}+\overline{\psi_{\beta}(\tau,v_{1})}\psi_{\beta}(\tau,v_{2})]\pi(\tau)d\tau.$ The sieve representer is the unique $v_{n}^{*}\in\overline{V}_{k(n)}$ such that $\langle v_{n}^{*},v\rangle=\frac{d\phi(\beta_{0})}{d\beta}[v],\forall v\in\overline{V}_{k(n)}$ . The sieve score $S_{n}^{*}$ is: $S_{n}^{*}=\int Real(\psi_{\beta}(\tau,v_{n}^{*})\overline{[\hat{\psi}_{n}^{S}(\tau,\beta_{0})-\hat{\psi}_{n}(\tau)]})\pi(\tau)d\tau$ and the sieve long-run variance $\sigma_{n}^{*2}=n\mathbb{E}(S_{n}^{*2})=n\mathbb{E}([\int Real(\psi_{\beta}(\tau,v_{n}^{*})\overline{[\hat{\psi}_{n}^{S}(\tau,\beta_{0})-\hat{\psi}_{n}(\tau)]})\pi(\tau)d\tau]^{2}).$ The scaled sieve representer $u_{n}^{*}$ is: $u_{n}^{*}=v_{n}^{*}/\sigma_{n}^{*}.$

Assumption 4 (Equivalence Condition).

There exists $\underline{a}>0$ such that for all $n\geq 1$ : $\underline{a}\|v_{n}^{*}\|_{weak}\leq\sigma_{n}^{*}.$ Furthermore, suppose that $\sigma_{n}^{*}$ does not increase too fast: $\sigma_{n}^{*}=o(\sqrt{n}).$

Assumption 5 (Convergence Rate, Smoothness, Bias).

Suppose that the set $\mathcal{B}_{osn}$ is a convex neighborhood of $\beta_{0}$ and: i) Rate of convergence: $M_{n}\delta_{n}=o(n^{-1/4})$ and $M_{n}\delta_{n}=o(\sqrt{\underline{\lambda}_{n}}/\left(k(n)\log(n)\right)^{4/\gamma^{2}})$ . ii) Smoothness: a linear expansion of $\phi$ is locally uniformly valid $\sup_{\|\beta-\beta_{0}\|\leq M_{n}\delta_{n}}\frac{\sqrt{n}}{\sigma_{n}^{*}}\Big{|}\phi(\beta)-\phi(\beta_{0})-\frac{d\phi(\beta_{0})}{d\beta}[\beta-\beta_{0}]\Big{|}=o(1)$ and $(\beta_{1},\beta_{2})\to\frac{d\phi(\beta_{1})}{d\beta}[\beta_{2}]$ is continuous in $\beta_{1}$ , linear in $\beta_{2}$ , as well as for the moments $\sup_{\|\beta-\beta_{0}\|_{weak}\leq M_{n}\delta_{n}}(\int\big{|}\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta))-\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))-\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))}{d\beta}[\beta-\beta_{0}]\big{|}^{2}\pi(\tau)d\tau)^{1/2}=O([M_{n}\delta_{n}]^{2}).$ Bounded second derivative: $\sup_{\|\beta-\beta_{0}\|_{weak}\leq M_{n}\delta_{n}}\int\Big{|}\frac{d^{2}\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))}{d\beta d\beta}[u_{n}^{*},u_{n}^{*}]\Big{|}^{2}\pi(\tau)d\tau=O(1).$ iii) Bias: negligible approximation bias: $\frac{\sqrt{n}}{\sigma_{n}^{*}}\frac{d\phi(\beta_{0})}{d\beta}[\beta_{0,n}-\beta_{0}]=o(1).$

Definition 1 adapts standard quantities to a continuum of complex-valued moments, the sieve variance corresponds to the square of the standard errors used for inference. A simple plug-in estimator is described below. Similarly, Assumptions 4, 5 are commonly used to derive asymptotic linear expansions to then apply a Central Limit Theorem to the leading term. By Corollary 1 above, allowing $S\to\infty$ makes the rate assumptions 5i) easier to verify. Condition ii) automatically holds for linear functionals, like reporting the finite dimensional $\theta$ or pointwise evaluation of the density $f$ .

Theorem 3 (Asymptotic Normality).

Suppose the assumptions of Theorems 1, 2 and Lemmas A5, A6 hold as well as Assumptions 4 and 5, then as $n$ goes to infinity:

[TABLE]

Theorem 3 shows that, under the above assumptions, inference on $\phi(\beta_{0})$ can be conducted using the confidence interval $[\phi(\hat{\beta}_{n})\pm 1.96\times\sigma_{n}^{*}/\sqrt{n}]$ . The standard errors $\sigma_{n}^{*}>0$ adjust automatically so that $r_{n}=\sqrt{n}/\sigma_{n}^{*}$ gives the correct rate of convergence. If $\lim_{n\to\infty}\sigma_{n}^{*}<\infty$ , then $\phi(\hat{\beta}_{n})$ is $\sqrt{n}-$ convergent. The dependence conditions are sufficient to apply the Central Limit Theorem of Wooldridge and White (1988) which yields the result.

To compute standard errors in practice, two matrices are computed and multiplied in a sandwich form.444Alternatively, one could build confidence sets by inverting a test based on an Integrated Conditional Moment (ICM) statistic, see Santos (2012) for an application to NPIV. First, the bread is computed using:

[TABLE]

where $\overline{\partial_{\theta,\omega,\mu,\sigma}\hat{\psi}_{n}^{S}(\tau)}$ is the complex conjugate of $\partial_{\theta,\omega,\mu,\sigma}\hat{\psi}_{n}^{S}(\tau)$ evaluated at the estimates $(\hat{\theta}_{n},\hat{\omega}_{n},\hat{\mu}_{n},\hat{\sigma}_{n})$ . Derivatives are computed by finite differences. With a finite integration grid, this is simply a matrix product. Second, the meat is the sum of HAC variance estimates $V_{1},V_{1,S}$ for the following two vector-valued series:

[TABLE]

where real and im take the real and imaginary part. As in the finite-dimensional case $V_{1,S}=V_{1,1}/S$ : a larger $S$ implies more precise estimates. The plug-in estimate of $\sigma_{n}^{*2}$ is:

[TABLE]

Supplemental Appendix E.2 derives the formula and provides the assumptions required for consistency of the standard errors. On efficiency, it can be showed that estimating $f$ typically affects the estimates for $\theta$ . Further, Yu (2004) shows for the location-scale Gaussian model that minimizing the ECF distance is not efficient because the objective puts weight on all moments. To achieve efficiency, Carrasco and Florens (2000) apply a regularized inverse of the covariance operator to the moment function which extends optimal weighting in GMM. The proofs for the results above allow to apply a bounded operator $B$ which implies a non-vanishing regularization in their setting. While this could improve the properties of the estimates in theory, simulations (not reported here) suggest that the estimates are sensitive to the choice of regularization parameter.

4 Monte-Carlo Illustrations

Three simple examples illustrate the properties of the estimator and compare it with a Bayesian nonparametric estimator based on Gaussian mixtures. All examples are conducted in R using the PSO package. Given sample and simulated data, the ECF and ECF distance are computed with RcppArmadillo which is more efficient for standard matrix operations than baseline R. The unknown $f$ is the skewed-logistic distribution. To illustrate Theorem 3, rejection rates of confidence intervals for $\theta$ are reported. The DGPs considered are:

[TABLE]

where $e_{t}\overset{iid}{\sim}f$ in (8)-(9) and $e_{1,t}\overset{iid}{\sim}f$ , $e_{1,t}\overset{iid}{\sim}\chi^{2}_{1}$ in (10). (8) and (9) provide a benchmark to illustrate the effect of the dependence on the estimated $\hat{f}_{n}$ and the effect of $S$ in Corollary 1. In (8), $\beta=(f)$ , and for (9) $\beta=(\rho_{y},f)$ . The stochastic volatility (SV) model (10) illustrates the first empirical example with a DGP similar to those used in estimations of Long-Run Risks (LRR) models. In (10), $f$ is restricted to have mean zero and unit variance. For DGPs (8)-(9), the sample size is $n=200$ ; for (10), $n=750$ similar to the application. The inputs are chosen as described in Section 2, with $L=0,1$ for (8), (9) respectively. In (10), $L=5$ is used – it is sufficiently large to identify $(\theta,f)$ , where $\theta=(\mu_{y},\rho_{y},\vartheta_{y},\mu_{\sigma},\rho_{\sigma},\kappa_{\sigma})$ . The first two examples use $200$ and the third $500$ integration points. $1000$ Monte-Carlo replications are used in (8)-(9), $200$ in (10).

Bayesian estimation is conducted using a Metropolis-Hastings algorithm, the proposal is tuned to target an acceptance rate between $20$ and $40\%$ accross simulations. For (8), (9) the likelihood is computed analytically. Bayesian estimates are not reported for (10), due to the computational burden of performing many Monte Carlo replications. The prior is uniform for $\rho$ , Dirichlet(1/2) for $\omega$ , $\mathcal{N}(0,10)$ for $\mu$ and inverse-Gamma( $2.1,1.1$ ) for $\sigma$ . For reference, a semiparametric GMM estimator is also reported for (9), (10); the usual OLS estimator for (9) and moments conditions that identify $(\mu_{y},\rho_{y})$ separately from other parameters in $(\theta,f)$ . Figure 1 illustrates estimates of $f$ in (8)-(10) for $k=3$ , $S=1,5$ and compares with Bayesian estimates in (8)-(9) and an infeasible kernel estimator that directly observed $e_{1,t}$ in (10).

Table 1 summarizes the simulation results for $\rho_{y}=0.6$ in (9) and $\rho_{y}=0.98$ in (10). For the AR(1), biases, standard deviations and sizes are similar across methods. There is size distortion due to small sample bias. There is more size distortion for the SV model (10), but distortion declines with $k$ and $S$ . Here, using a larger $k=4$ leads to rejection rates of $0.11$ and $0.10$ for $S=5,20$ respectively which is closer to the nominal $5\%$ level. Figure 1 show the properties of the estimated distribution $f$ . Both bias and variance are slightly larger with dynamics (panel b) compared to the static case (panel a). Bias is larger for the SV model (panel c) where the higher-order moments of $y$ identify both $(\mu_{\sigma},\rho_{\sigma},\kappa_{\sigma})$ and $f$ .

5 Applications to Asset Pricing

5.1 Non-Gaussian Shocks and Long-Run Uncertainty

The first application considers a reduced-form specification of the consumption process in Bansal and Yaron (2004). They model consumption growth as a persistent AR(1) process plus white noise with a shared stochastic volatility component. The reduced form used here is the same ARMA(1,1) with time-varying volatility process used in the simulations above:

[TABLE]

The data consists of real monthly consumption growth, excluding food and energy, from Feb 1959 to Dec 2019. Parameter estimates and standard errors for $S=20$ are reported in Table 3. Gaussian ARMA QMLE estimates (without SV) are reported for reference. The estimate of $\hat{\rho}_{y,n}$ is large, in line with calibrations and estimates in the LRR literature. The large negative $\hat{\vartheta}_{y,n}$ further confirms that the persistent long-run risk component is small. $\hat{\kappa}_{\sigma,n}$ is multiplied by $10^{4}$ in the table for readability. Volatility is less persistent than calibrated in Bansal and Yaron (2004), but its magnitude is comparable. Table 3 shows the effect of uncertainty on the risk-free rate by evaluating the last term in $r_{t}=\text{const}-\log[\mathbb{E}_{t}(\exp(-\gamma\sigma_{t+1}e_{1,t+1}))]$ conditional on $\sigma_{t}=\overline{\sigma}$ , $e_{1,t}=0$ ; i.e. both are set equal to their long-run average. For the Gaussian ARMA model, the effect is small and negative. The Gaussian SV model finds a positive effect. The symmetry in the distribution implies equal probability for large positive and negative outcomes. This can lead to surprising results such as a higher welfare with time-varying uncertainty than in a deterministic economy (Cho et al., 2015). Using a recursive utility with preference for early resolutions of uncertainty can negate this positive income effect with a intertemporal substitution effect. Without changing the utility function, mixture estimates with $k=2,3,4$ find a larger, negative term compared to the other two baseline predictions. This simple exercise suggests $f_{e}$ can have interesting asset pricing implications which are further explored in the second application.

5.2 Bond Pricing in a Production Economy

This empirical application illustrates the empirical relevance of non-Gaussian shocks for estimates of relative risk-aversion using the model of Van Binsbergen et al. (2012). They estimate a bond pricing model in a production economy with inflation and recursive utility by maximum likelihood using the particle-filter and report large estimates of risk-aversion.

Model:

A representative agent maximizes intertemporal utility over consumption $C_{t}$ :

[TABLE]

where $\beta\in(0,1)$ is the discount factor, $\gamma$ measures relative risk aversion and $1/\psi$ the intertemporal elasticity of substitution (IES). If $\gamma=\psi$ , the utility becomes CRRA. Leisure is omitted here because the calibrated specification in Van Binsbergen et al. (2012) fits the data very poorly.555See Gourio (2012) Section IV.A and Rudebusch and Swanson (2012) p110 for more detailed discussions. Technology evolves in logs according to a random-walk with drift:

[TABLE]

where $e_{1,t+1}\overset{iid}{\sim}f_{1}$ has mean zero. The budget and ressource constraints are:

[TABLE]

$I_{t}$ is investement, $K_{t}$ capital, $l_{t}=1$ hours worked, $B_{t}$ number of contingent bonds with price $1/R_{t}$ , $P_{t}$ the price of goods and $Y_{t}=Z_{t}^{1-\alpha}K_{t}^{\alpha}$ is aggregate output. Capital accumulation evolves according to $K_{t+1}=(1-\delta)K_{t}+G(I_{t}/K_{t})K_{t}$ . $\delta\in(0,1)$ is the depreciation rate, $G(I_{t}/K_{t})=a_{1}+\frac{a_{2}}{1-1/\tau}(I_{t}/K_{t})^{1-1/\tau}$ with $\tau>0$ measures adjustment costs as in Jermann (1998). $a_{1},a_{2}$ are set to have no adjustement costs in the steady-state. Inflation $\pi_{t+1}=P_{t+1}/P_{t}$ follows ARMA(1,1) dynamics in logs, where the MA(1) component is the sum of two independent MA(1) processes:

[TABLE]

where $e_{2,t+1}\overset{iid}{\sim}f_{2}$ has mean zero. The stochastic discount factor (SDF) is:

[TABLE]

where $V_{t}=\max_{C_{t},I_{t}}U_{t}$ is the value function and $W_{t}=\mathbb{E}_{t}[V_{t+1}^{1-\gamma}]^{\frac{1}{1-\gamma}}$ is certainty-equivalent future utility. With the SDF, the price $Q_{t,\ell}$ of a $\ell\geq 1$ period bond is computed recursively:

[TABLE]

where $Q^{0}_{t+1}=1$ . The $\ell$ -period yield is $i_{t,\ell}=-\log(Q_{t,\ell})/\ell$ . Ruge-Murcia (2017) estimates a similar model with skewness but in a stationary economy.

Solution method:

Accurate approximations require using as much information from $f_{1},f_{2}$ as possible. Perturbation methods of order $\ell$ only use the first $\ell$ moments of $f_{1},f_{2}$ . Value function iteration is too computationally costly. Projection methods are not sufficiently stable for estimation. Taylor projection (Levintal, 2018) provides a good compromise between perturbation and projection as shown in Fernández-Villaverde and Levintal (2018). This appears to be the first application of Taylor projection for estimation. Besides solving in logs rather than levels, the equations above should be normalized to ensure the solution is stable and accurate, e.g. $1=\mathbb{E}_{t}\left(M_{t+1}Q_{t+1}^{\ell-1}/[Q_{t}^{\ell}\pi_{t+1}]\right)$ for (15).

The model is non-stationary since all variables, except inflation and yields, are driven by a unit-root. It is solved in terms of de-trended variables $\tilde{C}_{t}=C_{t}/Z_{t-1},\tilde{K}_{t}=K_{t}/Z_{t-1},\tilde{I}_{t}=I_{t}/Z_{t-1},\tilde{Z}_{t}=\exp(\lambda+e_{1,t})$ . Growth rates are computed as $\Delta\log(C_{t+1})=\Delta\log(\tilde{C}_{t+1})+\Delta\log(Z_{t})$ . Assumption 2 y(i),u(i) holds for $\tilde{Z}_{t}$ and $\log\pi_{t}$ if $|\rho|\leq\bar{\rho}<1$ . Pruning is then used to stabilize the remaining variables.

Data:

The data consist of $n=235$ observations for quarterly growth rate of consumption (personal expenditure in services plus durables), investment growth (private non-residential fixed), quarterly inflation (growth rate of GDP deflator) and three/six-month Treasury yields between 1961Q2 and 2019Q4, all taken from the FRED database. One and two-year yields are constructed from the Federal Reserve’s daily nominal yield curve database. Van Binsbergen et al. (2012) also use yields at longer horizons.

Estimation results:

Several parameters are calibrated $\lambda=0.0045$ , $\delta=0.0294$ , $\alpha=0.3$ . Van Binsbergen et al. (2012) also calibrate $\rho=0.955$ which is estimated here. Sample and simulated consumption and investment growth are de-meaned to remove the effect of the calibration on the levels. The model is estimated with $S=10$ for $k=1$ (Gaussian), 2, 3, 4 and 5. Preliminary estimates are computed by first-order projection, which are then added to the initial swarm matrix to compute the final estimates with second-order projection.

Table 2 below reports the estimates for $\theta$ . The main pattern of interest is the decline of $\hat{\gamma}_{n}$ with $k$ . This is reminiscent of the long-run risks and rare disasters literature which essentially find that a better representation of risk allows to match the asset prices with lower levels of risk aversion. Empirically however, Backus et al. (2011) find that rare disasters are not large and frequent enough in the data to solve the equity premium puzzle. Here, the focus is on business cycle frequency risks, with a sample that excludes world wars and the great depression but still includes several recessions and inflationary events. The interesting finding is that these risks accomodate much lower levels of relative risk aversion in an estimation setting. Standard errors also decrease with $k$ since the objective has more curvature for smaller $\gamma$ . As in Van Binsbergen et al. (2012), the model is very hard to estimate with Gaussian shocks. Here, several coefficients are close to the optimization bounds (lb, ub). The choice of $k=4$ seems to best balance bias and variance: estimates are similar with $k=5$ but standard errors are greater. In a 12 core cluster environment, estimation takes 14h15m, 8h34m, 7h24m, 7h19m and 6h2m for $k=1,\dots,5$ respectively. The main bottleneck is in solving the model. Taylor projection is initialized with a third-order perturbation, which is more accurate for smaller $\gamma$ . For $\gamma\geq 40$ and some corner values, the default solver may fail to converge, using exceptions to switch for a slower more robust solver after a failed convergence works but makes estimation very time-consuming. This mostly affects $k=1$ for which $\hat{\theta}_{n}$ is closer to the bounds.

The estimated $\widehat{\text{IES}}=1/\hat{\psi}_{n}$ is greater than $5$ in all specifications. The null hypothesis of a CRRA utility is rejected, $H_{0}:\gamma=\psi$ , with t-statistics of 2.2, 4.1, 3.8, 3.8 and 3.7 for $k=1,\dots,5$ respectively. For reference, in their calibration Bansal and Yaron (2004) favour $\text{IES}=1.5$ . Using aggregate consumption, Chen et al. (2013) report a confidence interval ranging from $2$ to $5$ . Van Binsbergen et al. (2012) estimate $\text{IES}=1.7$ with a very large $\hat{\gamma}_{n}=66$ and a small $\hat{\tau}_{n}=0.1$ . On the latter, they exclude investment from the estimation and report a poor fit in that dimension. Here, $k=1$ estimates a small $1/\hat{\tau}_{n}=1e-3$ , $\text{se}(1/\hat{\tau}_{n})=0.035$ not significantly different from zero.666The model is solved in terms of $1/\tau$ making these quantities readily available. The delta-method used to produce Table 2 is invalid at $1/\tau=0$ , using the continuous mapping theorem to a CI for $1/\tau$ yields a more robust CI for $\tau$ itself: $[128,+\infty)$ . For $k=2$ , $1/\hat{\tau}_{n}=0.0198$ , $\text{se}(1/\hat{\tau}_{n})=0.0076$ is statistically different from zero at the 1% significance level and less problematic. Table D4 in the Supplement provides additional results for $1/\hat{\tau}_{n}$ as well as $1/\hat{\gamma}_{n}$ .

Table 3 below compares selected sample with simulated moments. The fit is generally better with larger $k$ . To better understand the estimated IES, the last row changes the IES to $1.5$ , keeping the other coefficients at the $k=4$ estimates. The smaller IES increases average yields but reduces the slope of the yield curve and the variance of consumption. The correlation between consumption growth and yields is slightly positive in the data but very negative for $k=1$ and $\text{IES}=1.5$ . For larger $k$ , these correlations are closer to the sample.

Figure 4 compares sample with simulated distributions and shows the estimated densities $f_{1},f_{2}$ . The latter two are normalized to have variance equal to $1$ . For $k=4$ , estimates of $f_{1},f_{2}$ have skewnesses of $-1.45,2.18$ and kurtoses of $6.57,11.48$ which indicate excess downside risks for technology shocks and upwards risks for inflation. While mixtures improve the fit for consumption and inflation, investment and 3m yields are more challenging to match. In the sample, investment has smaller kurtosis than consumption, $5$ and $9$ respectively, while in simulations, the converse is true: investment has larger kurtosis than consumption, $5$ and $4$ with $k=4$ . In the model, investment is the only source of endogenous variation for output, adding labor would provide another. Also, varying capital utilization could provide more realistic fluctuations in output and investment (King and Rebelo, 1999). 3m yields were above 10% only between 1979Q3 and 1984Q3 and below 0.5% only between 2008Q4 and 2016Q3, i.e. both tails are associated with specific monetary policy regimes. This suggests that modelling monetary policy regimes is needed to improve the fit of yields in the tails.

There are two main takeways from this application. First, allowing for a flexible distribution in the shocks $(e_{1,t},e_{2,t})$ allows to better capture risks and leads to much smaller estimates of relative risk aversion. This highlights the empirical relevance of using a semi-nonparametric approach in this setting. Second, the model is very simple and has limitations that show in the results. It cannot match the variance of consumption without a large IES, as shown in Table 3. Overall, the flexible estimation fits the data better in some dimensions using more reasonable parameters values that also seem to be more accurately estimated. However, the flexible distribution does not improve the fit in all dimensions and issues remain such as the zero lower bound on interest rates, the joint dynamics of consumption and investment, among others. Going forward, estimating the distribution of the shocks in a more realistic model that can capture these feature would be of interest.

6 Conclusion

Simulation-based estimation is a powerful approach to estimate intractable models. Using a mixture sieve with the empirical characteristic function, this paper provides an approach to estimate semi-nonparametric models by simulation. Estimation using the ECF can be unstable depending on the choice of the weight function $\pi$ , see e.g. Chen et al. (2019) section 2.1.1 for a discussion. The approach suggested in Section 2 provides a simple way to give more or less weight to lower-order moments and then to check the fit for selected moments as in Table 3. Alternatively, the conditional cdf or pdf can be used as moments. Approximation results in De Jonge and Van Zanten (2010), Norets (2010) can be used to consider joint or conditional densities. Estimating other objects nonparametrically, such as a utility or production function, can also be of interest. Another direction of research would be to develop general theory for sieve indirect inference estimation.

Appendix A Preliminary Results

Lemma A1 (Approximation Properties of the Gaussian and Tails Mixture).

Suppose that the shocks $e=(e_{t,1},\dots,e_{t,d_{e}})$ are independent with density $f=f_{1}\times\dots\times f_{d_{e}}$ . Suppose that each marginal $f_{j}$ can be decomposed into a smooth density $f_{j,S}$ and the two tails density $f_{L},f_{R}$ :

[TABLE]

Let each $f_{j,S}$ satisfy the assumptions of Kruijer et al. (2010): i) Smoothness: $f_{j,S}$ is $r$ -times continuously differentiable with bounded $r$ -th derivative. ii) Tails: $f_{j,S}$ has exponential tails, i.e. there exists $\bar{e},M_{f},a,b>0$ such that $f_{j,S}(e)\leq M_{f}e^{-a|e|^{b}},\,\forall|e|\geq\bar{e}.$ iii) Monotonicity in the Tails: $f_{j,S}$ is strictly positive and there exists $\underline{e}<\overline{e}$ such that $f_{j,S}$ is weakly decreasing on $(-\infty,\underline{e}]$ and weakly increasing on $[\overline{e},\infty)$ and $\|f_{j}\|_{\infty}\leq\overline{f}$ for all $j$ . Then there exists a Gaussian and tails mixture $\Pi_{k}f=\Pi_{k}f_{1}\times\dots\times\Pi_{k}f_{d_{e}}$ satisfying the restrictions of Kruijer et al. (2010): iv) Bandwidth: $\sigma_{j}\geq\underline{\sigma}_{k}=O(\frac{\log[k]^{2/b}}{k})$ . v) Location Parameter Bounds: $\mu_{j}\in[-\bar{\mu}_{k},\bar{\mu}_{k}]$ with $\bar{\mu}_{k}=O\left(\log[k]^{1/b}\right)$ such that as $k\to\infty$ :

[TABLE]

where $\|\cdot\|_{\mathcal{F}}=\|\cdot\|_{TV}$ or $\|\cdot\|_{\infty}$ .

The following Lemma is needed to verify the $L_{2}$ -smoothness condition when using the Gaussian and tails mixture.

Lemma A2 (Properties of the Tails Distributions).

Let $\bar{\xi}\geq\xi_{1},\xi_{2}\geq\underline{\xi}>0$ . Let $\nu_{t,1}^{s}$ and $\nu_{t,2}^{s}$ be uniform $\mathcal{U}_{[0,1]}$ draws and:

[TABLE]

The densities of $e_{t,1}^{s},e_{t,2}^{s}$ satisfy $f_{e_{t,1}^{s}}(e)\sim e^{-3-\xi_{1}}$ as $e\to-\infty$ , $f_{e_{t,2}^{s}}(e)\sim e^{-3-\xi_{2}}$ as $e\to+\infty$ . There exists a finite $C$ bounding the second moments $\mathbb{E}\left(|e_{t,1}^{s}|^{2}\right)\leq C<\infty$ and $\mathbb{E}\left(|e_{t,2}^{s}|^{2}\right)\leq C<\infty$ . Furthermore, the draws $y_{t,1}^{s}$ and $y_{t,2}^{s}$ are $L^{2}$ -smooth in $\xi_{1}$ and $\xi_{2}$ respectively:

[TABLE]

Where the constant $C$ only depends on $\underline{\xi}$ and $\bar{\xi}$ .

Lemma A3 (Covering Numbers).

Under the $L^{2}$ -smoothness of the DGP (as in Lemma 2), the bracketing number satisfies for $x\in(0,1)$ and some $\overline{C}$ :

[TABLE]

For $\tau\in\mathbb{R}^{d_{\tau}}$ , let $\Psi_{k(n)}(\tau)$ be the set of functions $\Psi_{k(n)}(\tau)=\left\{\beta\rightarrow e^{i\tau^{\prime}(\mathbf{y}_{t}(\beta),\mathbf{x}_{t})}\pi(\tau)^{1/2},\,\beta\in\mathcal{B}_{k(n)}\right\}$ . The bracketing entropy of each set $\Psi_{k(n)}(\tau)$ satisfies for some $\tilde{C}$ :

[TABLE]

Using the above, for some $\tilde{C}_{2}<\infty$ :

[TABLE]

Lemma A4 (Nonparametric Approximation Bias).

Suppose Assumptions 1 and 2 (or 2*′*) hold. Furthermore suppose that $\mathbb{E}\left(\|y_{t}^{s}\|^{2}\right)$ and $\mathbb{E}\left(\|u_{t}^{s}\|^{2}\right)$ are bounded for $\beta=\beta_{0}$ and $\beta=\Pi_{k(n)}\beta_{0}$ for all $k(n)\geq 1$ , $t\geq 1$ then:

[TABLE]

where $\Pi_{k(n)}\beta_{0}$ is the mixture approximation of $\beta_{0}$ , $\gamma$ the Hölder coefficient in Assumption 2, $b$ and $r$ are the exponential tail index and the smoothness of the density $f_{S}$ in Lemma A1.

Lemma A5 (Convergence Rate in $\|\cdot\|_{m}$ ).

Let $\delta_{n}=\sqrt{(k(n)\log[k(n)])^{4}/n}$ and $M_{n}=\log\log(n+1)$ . Suppose the following undersmoothing assumptions hold: i) Rate of Convergence: $\|\hat{\beta}_{n}-\beta_{0}\|_{weak}=O_{p}(\delta_{n})$ . ii) Negligible Bias: $\|\Pi_{k(n)}\beta_{0}-\beta_{0}\|_{weak}=o(\delta_{n})$ . Furthermore, suppose that the population CF is smooth in $\beta$ and satisfies: iii) Rate 1: uniformly over $\beta\in\{\beta\in\mathcal{B}_{osn},\|\beta-\beta_{0}\|_{weak}\leq M_{n}\delta_{n}\}$ : $\int\big{|}\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))}{d\beta}[\beta-\beta_{0}]-\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0}))}{d\beta}[\beta-\beta_{0}]\big{|}^{2}\pi(\tau)d\tau=O(\delta_{n}^{2}).$ iv) Rate 2: $\Pi_{k(n)}\beta_{0}$ satisfies $\int\Big{|}\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0}))}{d\beta}[\Pi_{k(n)}\beta_{0}-\beta_{0}]\Big{|}^{2}\pi(\tau)d\tau=O(\delta_{n}^{2}).$ Suppose $\underline{\lambda}_{n}=\lambda_{\min}(\int\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0}))}{d(\theta,\omega,\mu,\sigma)}^{\prime}\overline{\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0}))}{d(\theta,\omega,\mu,\sigma)}}\pi(\tau)d\tau)$ is strictly positive and $\delta_{n}\underline{\lambda}_{n}^{-1/2}=o(1)$ then:

[TABLE]

The following stochastic equicontinuity result, together with a longer version presented in Lemma B13, is needed to prove asymption normality (Theorem 3).

Lemma A6 (Stochastic Equicontinuity).

Let $\delta_{mn}=\delta_{n}\underline{\lambda}_{n}^{-1/2}$ , $M_{n}=\log\log(n)$ . If the assumptions in Lemma A5 hold and $(M_{n}\delta_{mn})^{\frac{\gamma^{2}}{2}}\max(\log[k(n)]^{2},|\log[M_{n}\delta_{mn}]|^{2})k(n)^{2}=o(1)$ , then:

[TABLE]

Also, suppose that $\beta\rightarrow\int\mathbb{E}\Big{|}\hat{\psi}_{t}^{s}(\tau,\beta_{0})-\hat{\psi}_{t}^{s}(\tau,\beta)\Big{|}^{2}\pi(\tau)d\tau$ is continuous with respect to $\|\cdot\|_{\mathcal{B}}$ at $\beta=\beta_{0}$ , uniformly in $t\geq 1$ , then a second stochastic equicontinuity result holds:

[TABLE]

Appendix B Proofs for the Main Results

The proofs for the main results allow for a bounded linear operator $B$ , as in Carrasco and Florens (2000), to weight the moments. The operator is assumed to be fixed:

[TABLE]

Since $B$ is bounded linear there exists a $M_{B}>0$ such that for any two CFs:

[TABLE]

As a result, the rate of convergence for the objective function with the weighting $B$ is the same as the rate of convergence without the operator $B$ .777For results on estimating the optimal $B$ see Carrasco and Florens (2000); Carrasco et al. (2007). Using their method would lead to $M_{\hat{B}}\to\infty$ as $n\to\infty$ resulting in a slower rate of convergence for $\hat{\beta}_{n}$ . Having $M_{\hat{B}}\to\infty$ sufficiently slow would not alter the main results besides having a different, possibly more efficient, asympotic variance.

B.1 Consistency

Proof of Lemma 1.

:

The difference between $e_{t}^{s}$ and $\tilde{e}_{t}^{s}$ can be split into two terms:

[TABLE]

To bound the term (B.16) in expectation, combine the fact that $|\mu_{j}|\leq\bar{\mu}_{k(n)},|\sigma_{j}|\leq\bar{\sigma}$ and $\nu_{t}^{s}$ and $Z_{t,j}^{s}$ are independent so that:

[TABLE]

The last term is bounded above by $\bar{\mu}+\bar{\sigma}C_{Z}.$ Next, note that

$\mathbbm{1}_{\nu^{s}_{t}\in[\sum_{l=0}^{j-1}\omega_{l},\sum_{l=0}^{j}\omega_{l}]}-\mathbbm{1}_{\nu^{s}_{t}\in[\sum_{l=0}^{j-1}\tilde{\omega}_{l},\sum_{l=0}^{j}\tilde{\omega}_{l}]}\in\{0,1\}$ so that:

[TABLE]

Also, for any $j$ : $|\sum_{l=0}^{j}\tilde{\omega}_{l}-\sum_{l=0}^{j}\omega_{l}|\leq\sum_{l=0}^{j}|\tilde{\omega}_{l}-\sum_{l=0}^{j}\omega_{l}|\leq\left(\sum_{l=0}^{j}|\tilde{\omega}_{l}-\omega_{l}|^{2}\right)^{1/2}\leq\|\tilde{\omega}-\omega\|_{2}\leq\delta.$ Following a similar approach to Chen et al. (2003):

[TABLE]

Overall the term (B.16) is bounded above by $\sqrt{2}(1+C_{Z})\left(\bar{\mu}_{k(n)}+\bar{\sigma}+k(n)\right)\sqrt{\delta}$ . The term (B.17) can be bounded above by using $0\leq\mathbbm{1}_{\nu^{s}_{t}\in[\sum_{l=0}^{j-1}\tilde{\omega}_{l},\sum_{l=0}^{j}\tilde{\omega}_{l}]}\leq 1$ and:

[TABLE]

Without loss of generality assume that $\delta\leq 1$ so that:

[TABLE]

which concludes the proof. ∎

Proof of Lemma 2:.

First note that the cosine and sine functions are uniformly Lispchitz on the real line with Lipschitz coefficient $1$ . This implies for any two $(\mathbf{y}_{1},\mathbf{y}_{2},\mathbf{x})$ and any $\tau\in\mathbb{R}^{d_{\tau}}$ :

[TABLE]

As a result, the moment function is also Lipschitz in $\mathbf{y},\mathbf{x}$ :

[TABLE]

Since $\pi$ is chosen to be the Gaussian density, it satisfies $\sup_{\tau}\|\tau\|_{\infty}\pi(\tau)^{\frac{1}{4}}\leq C_{\pi}<\infty$ and $\pi(\tau)^{\frac{1}{2}}\propto\pi(\tau/\sqrt{2})$ which has finite integral. The Lispschitz properties of the moments combined with the conditions properties of $\pi$ imply that the $L^{2}$ -smoothness of the moments is implied by the $L^{2}$ -smoothness of the simulated data itself. As a result, the remainder of the proof focuses on the $L^{2}$ -smoothness of $\mathbf{y}_{t}^{s}$ . First note that since $\mathbf{y}_{t}=(y_{t},\dots,y_{t-L})$ :

[TABLE]

To bound the term in $\mathbf{y}$ above, it suffices to bound the expression for each term $y_{t}$ with arbitrary $t\geq 1$ . Assumptions 2, 2*′* imply that, for some $\gamma\in(0,1]$ :

[TABLE]

The term $\frac{\delta^{\gamma}}{\underline{\sigma}_{k(n)}^{2\gamma}}$ comes from the fact that $\|\beta_{1}-\beta_{2}\|_{\infty}\leq\frac{\|\beta_{1}-\beta_{2}\|_{m}}{\underline{\sigma}^{2}_{k(n)}}$ and $\|\beta_{1}-\beta_{2}\|_{TV}\leq\frac{\|\beta_{1}-\beta_{2}\|_{m}}{\underline{\sigma}_{k(n)}}$ on $\mathcal{B}_{k(n)}$ . Without loss of generality, suppose that $\underline{\sigma}_{k(n)}\leq 1$ .888Recall that by assumption $\underline{\sigma}_{k(n)}=O(\frac{log[k(n)]^{2/b}}{k(n)})$ goes to zero. Applying this inequality recursively, and using the fact that $y_{0}^{s},u_{0}^{s}$ are the same regardless of $\beta$ , yields:

[TABLE]

Using Lemmas 1 and A2 and the same approach as above:

[TABLE]

Again, applying this inequality recursively yields:

[TABLE]

Putting everything together:

[TABLE]

Without loss of generality, suppose that $\delta\leq 1$ . Then, for some positive constant $\overline{C}$ :

[TABLE]

∎

Proof of Theorem 1:.

The main idea is to show that the Assumptions for Lemma B8 hold. The proof proceeds in in four steps:

First, geometric ergodicity and uniform boundedness of $\hat{\psi}_{n}$ implies:

[TABLE] 2. 2.

Then Lemma 2 combined with Lemmas A3, B11 imply that uniformly over $\beta\in\mathcal{B}_{k(n)}$ :

[TABLE]

where $C_{n}$ is given below. 3. 3.

The triangle inequality and the previous steps imply that, uniformly over $\beta\in\mathcal{B}_{k(n)}$ :

[TABLE]

And, because $B$ is a bounded linear operator:

[TABLE] 4. 4.

By the inequality $|a-b|^{2}\geq 1/2|a|^{2}+|b|^{2}$ and the previous step, uniformly over $\beta\in\mathcal{B}_{k(n)}$ :

[TABLE]

and $1/2\int|\mathbb{E}(B\hat{\psi}^{S}_{n}(\tau,\beta)-B\hat{\psi}_{n}(\tau))|^{2}\pi(\tau)d\tau\leq\int|B\hat{\psi}^{S}_{n}(\tau,\beta)-B\hat{\psi}_{n}(\tau)|^{2}\pi(\tau)d\tau+O_{p}(\max(1,C_{n})/n).$

This will help show that condition d) in Lemma B8 holds.

First, consider steps 1. and 2:

Step 1.: For $M>0$ , a convergence rate $r_{n}$ and Markov’s inequality:

[TABLE]

The last two inequalities come from Lemma B9. If the data is iid then the mixing coefficients $\alpha(m)=0$ for all $m\geq 1$ . $C_{\alpha,p}$ is a constant that only depends on the mixing rate $\alpha$ , $p$ and the bound on $|\hat{\psi}_{t}(\tau)-\mathbb{E}(\hat{\psi}_{t}(\tau))|\leq 2$ . For $r_{n}=1/n$ and $M\to\infty$ the probability goes to zero. As a result: $\int|\hat{\psi}_{n}(\tau)-\mathbb{E}(\hat{\psi}_{n}(\tau))|^{2}\pi(\tau)d\tau=O_{p}(1/n)$ .

Step 2.: The proof is similar to the proof of Lemma C.1 in Chen and Pouzo (2012). It also begins similarly to Step 1, for $M>0$ , a convergence rate $r_{n}$ ; using Markov’s inequality:

[TABLE]

Suppose that there is an upper bound $C_{n}$ such that for all $\tau$ :

[TABLE]

If the following also holds $\int\pi(\tau)^{1-2/(2+\eta)}d\tau=C_{\eta}<\infty$ then:

[TABLE]

Take $r_{n}=C_{n}/n=o(1)$ , then for $M\to\infty$ the probability goes to zero. As a result:

[TABLE]

The bounds $C_{n}$ are now computed, first in the iid case. By theorem 2.14.5 of van der Vaart and Wellner (1996):

[TABLE]

Also, by theorem 2.14.2 of van der Vaart and Wellner (1996) there exists a universal constant $K>0$ such that for each $\tau\in\mathbb{R}^{d_{\tau}}$ :

[TABLE]

with $\Psi_{k(n)}=\big{\{}\psi:\mathcal{B}_{k(n)}\to\mathbb{C},\beta\to\psi_{t}^{S}(\tau,\beta)\pi(\tau)^{1/(2+\eta)}\big{\}}$ , $N_{[\,]}$ is the covering number with bracketing. Because of the $L^{p}$ -smoothness, it is bounded above by:

[TABLE]

Let $\sqrt{C_{n}}=\sqrt{1+\log N_{[\,]}(x^{1/\gamma},\mathcal{B}_{k(n)},\|\cdot\|)}dx$ , together with the previous inequality, it implies:

[TABLE]

To conclude, divide by $n$ on both sides to get the bound:

[TABLE]

For the dependent case, Lemma B11 implies that if $\hat{\psi}^{s}_{t}(\tau,\beta)$ is $\alpha$ -mixing at an exponential rate, the moments are bounded and the sieve spaces are compact:

[TABLE]

with, for any $\vartheta\in(0,1)$ such that the integral exists:

[TABLE]

Lemma A3 then derives bounds for $C_{n}$ in terms of $k(n)$ .

Step 3.: follows from the triangle inequality and the assumption that $B$ is a bounded linear operator.

Step 4.: The following two inequalities can be derived from the inequality $|a-b|^{2}\geq 1/2|a|^{2}+|b|^{2}$ , which is symmetric in $a$ and $b$ :

[TABLE]

and

[TABLE]

Taking integrals on both sides and given that

[TABLE]

uniformly in $h\in\mathcal{B}_{k(n)}$ , the desired result follows: $1/2\hat{Q}_{n}^{S}(\beta)\leq Q_{n}(\beta)+O_{p}(C_{n}/n)$ and $1/2Q_{n}(\beta)\leq\hat{Q}_{n}^{S}(\beta)+O_{p}(C_{n}/n).$

Lemma A3 implies that $C_{n}=O(k(n)^{4}\log[k(n)]^{4})$ in the dependent case, and $C_{n}=O(k(n)\log[k(n)])$ in the iid case. Combining this, condition (6) in the Theorem, the rate for $Q_{n}(\Pi_{k(n)}\beta_{0})$ which is derived in Lemma A4 together implies the conditions for Lemma B8 hold so that the estimator is consistent. ∎

B.2 Rate of Convergence

Proof of Theorem 2:.

Let $C_{n}$ be as in the proof of Theorem 1, let $\varepsilon>0$ and

[TABLE]

Proving the result amounts to showing that there exists $M>0$ and $N>0$ such that $\forall n\geq N$ :

[TABLE]

First, under the stated assumptions, the following inequalities hold:

$\hat{Q}_{n}^{S}(\beta)\leq 2Q_{n}(\beta)+O_{p}(C_{n}/n)$ , 2. 2.

$Q_{n}(\Pi_{k(n)}\beta_{0})\leq O(\max(\frac{\log[k(n)]^{4r/(b+2)}}{k(n)^{2\gamma^{2}r}},1/n^{2})$ , 3. 3.

$\|\beta-\beta_{0}\|^{2}_{weak}\leq\underline{C}_{w}^{-1}[Q_{n}(\beta)+O(1/n^{2})]$ .

The first was derived in the proof of Theorem 1, the second is due to Lemma A4 and the third comes from Assumption 3 with Lemma B12. Applying them in order to (B.18):

[TABLE]

For $r_{n}$ defined above, this probability becomes: $\mathbb{P}\left(M^{2}\leq O_{p}(1)\right)\to 0\text{ as }M\to\infty.$ This concludes the first part of the proof. By definition of the local measure of ill-posedness:

[TABLE]

Applying Lemma A4 again to $Q_{n}(\Pi_{k(n)}\beta_{0})$ concludes the proof. ∎

Proof of Corollary 1:.

The proof is immediate by taking the size of the simulated sample to be $nS$ instead of $n$ , which implies $\sqrt{C_{n}/n}=k(n)^{2}\log[n]^{2}/\sqrt{nS}$ in the proof of Theorem 2, and noting that $\hat{\psi}_{n}$ converges at a $\sqrt{n}$ -rate so that convergence is no faster than $\min\left(\frac{k(n)^{2}\log[n]^{2}}{\sqrt{n\times S}},\frac{1}{\sqrt{n}}\right)$ . ∎

B.3 Asymptotic Normality

Proof of Theorem 3:.

Assumption 5 ii-iii. allows the following linearization:

[TABLE]

Using Lemma B14 a) and b), replace the term $B\psi_{\beta}(\tau,\hat{\beta}_{n}-\beta_{0})$ under the integral with $B\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})-B\hat{\psi}_{n}^{S}(\tau,\beta_{0})$ so that:

[TABLE]

Now Lemma B14 c) implies that $B\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})$ can be replaced with $B\hat{\psi}_{n}(\tau)$ up to a $o_{p}(1/\sqrt{n})$ so that the above becomes:

[TABLE]

To conclude, apply a Central Limit Theorem to the real-valued random variable variable:

[TABLE]

Because of $u_{n}^{*}$ and the geometric ergodicity of the simulated data, a CLT for non-stationary mixing triangular arrays is required. The results in Wooldridge and White (1988) can be applied, the following verifies that the sufficient conditions hold. For any $\delta>0$ :

[TABLE]

By definition of $u_{n}^{*}$ and $\|\cdot\|_{weak}$ :

[TABLE]

Because $B$ is bounded linear and $|Z_{t}^{S}(\tau)|\leq 2$ : $\left[\mathbb{E}\left(\int\Big{|}BZ_{t}^{S}(\tau)\Big{|}^{2}\pi(\tau)d\tau\right)\right]^{\frac{2+\delta}{2}}\leq[2M_{B}]^{2+\delta}.$ Eventually, it implies:

[TABLE]

Given the mixing condition and the definition of $\sigma^{*}_{n}$ :

[TABLE]

By geometric ergodicity and because the characteristic function is bounded $\sqrt{n}|\mathbb{E}(Z_{t}^{S}(\tau))|\leq C_{\rho}/\sqrt{n}=o(1)$ , hence:

[TABLE]

This concludes the proof. ∎

Appendix A Proofs for the Preliminary Results

Proof of Lemma A1.

The proof proceeds by recursion. Denote $\Pi_{k(n)}f_{j}\in\mathcal{F}_{k(n)}$ the mixture approximation of $f_{j}$ from Lemma B7. For $d_{e}=1$ , Lemma B7 implies $\|f_{1}-\Pi_{k(n)}f_{1}\|_{TV}=O(\frac{\log[k(n)]^{r/b}}{k(n)^{r}})$ and $\|f_{1}-\Pi_{k(n)}f_{1}\|_{\infty}=O(\frac{\log[k(n)]^{r/b}}{k(n)^{r}}).$ Suppose the result holds for $f_{1}\times\dots\times f_{d_{e}}$ . Let $f=f_{1}\times\dots\times f_{d_{e}}\times f_{d_{e}+1}$ ; let:

[TABLE]

The difference can be re-written recursively:

[TABLE]

Since $\int f_{d_{e}+1}=\int\Pi_{k(n)}f_{1}\times\dots\times\Pi_{k(n)}f_{d_{e}}=1$ , the total variation distance is: $\|d_{t+1}\|_{TV}\leq\|d_{t}\|_{TV}+\|f_{d_{e}+1}-\Pi_{k(n)}f_{d_{e}+1}\|_{TV}=O(\frac{\log[k(n)]^{r/b}}{k(n)^{r}}).$ And the supremum distance is:

[TABLE]

∎

Proof of Lemma A2.

:

To reduce notation, the $t$ and $s$ subscripts will be dropped in the following. The proof is similar for both $e_{1}$ and $e_{2}$ so the proof is only given for $e_{1}$ .

First, the densities of $e_{1}$ and $e_{2}$ are derived, the first two results follow. Noting that the draws are defined using quantile functions, inverting the formula yields: $\nu_{1}=\frac{1}{1-e_{1}^{2+\xi_{1}}}$ . This is a proper CDF on $(-\infty,0]$ since $e_{1}\rightarrow\frac{1}{1-e_{1}^{2+\xi_{1}}}$ is increasing and has limits [math] at $-\infty$ and $1$ at [math]. Its derivative is the density function: $(2+\xi_{1})\frac{e_{1}^{1+\xi_{1}}}{(1-e_{1}^{2+\xi_{1}})^{2}}$ . It is continuous on $(-\infty,0]$ and has an asymptote at $-\infty$ : $(2+\xi_{1})\frac{e_{1}^{1+\xi_{1}}}{(1-e_{1}^{2+\xi_{1}})^{2}}\times e_{1}^{3+\xi_{1}}\to(2+\xi_{1})$ as $e_{1}\to-\infty$ . Since $\xi_{1}\in[\underline{\xi},\bar{\xi}]$ with $0<\underline{\xi}$ then $\mathbb{E}|e_{1}|^{2}\leq C<\infty$ for some finite $C>0$ . Similar results hold for $e_{2}$ which has density $(2+\xi_{2})\frac{e_{2}^{1+\xi_{2}}}{(1+e_{2}^{2+\xi_{2}})^{2}}$ on $[0,+\infty)$ .

Second, $\xi_{1}\rightarrow e_{1}(\xi_{1})$ is shown to be $L^{2}$ -smooth. Let $|\xi_{1}-\tilde{\xi}_{1}|\leq\delta$ , using the mean value theorem, for each $\nu_{1}$ there exists an intermediate value $\check{\xi_{1}}\in[\xi_{1},\tilde{\xi}_{1}]$ such that:

[TABLE]

The first term is bounded by $1/(2+\underline{\xi})$ , the second is bounded by $\log(\frac{1}{\nu_{1}}+1)\left(\frac{1}{\nu_{1}}+1\right)^{\frac{1}{2+\underline{\xi}}}$ , and the last term is bounded above, in absolute value, by $\delta$ .

Finally, in order to conclude the proof, the integral $\int_{0}^{1}\log(\frac{1}{\nu_{1}}+1)\left(\frac{1}{\nu_{1}}+1\right)^{\frac{2}{2+\underline{\xi}}}d\nu_{1}$ needs to be finite. By a change of variables, it can be re-written as: $\int_{2}^{\infty}\log(\nu)\nu^{\frac{2}{2+\underline{\xi}}-2}d\nu.$ Since $\frac{2}{2+\underline{\xi}}-2<-1$ , the integral is always finite and thus:

[TABLE]

∎

Proof of Lemma A3:.

Since $\mathcal{B}_{k(n)}$ is contained in a ball of radius $\max(\overline{\mu}_{k(n)},\overline{\sigma},\|\theta\|_{\infty})$ in $\mathbb{R}^{3[k(n)+2]+d_{\theta}}$ under $\|\cdot\|_{m}$ , the covering number for $\mathcal{B}_{k(n)}$ can be computed under the $\|\cdot\|_{m}$ norm using a result from Kolmogorov and Tikhomirov (1959). As a result, the covering number $N(x,\mathcal{B}_{k(n)},\|\cdot\|_{m})$ satisfies: $N(x,\mathcal{B}_{k(n)},\|\cdot\|_{m})\leq 2\left(3[k(n)+2]+d_{\theta}\right)\left(\frac{2\max(\bar{\mu}_{k(n)},\bar{\sigma})}{x}+1\right)^{3[k(n)+2]+d_{\theta}}.$ The rest follows from Lemmas 2 and B11. ∎

Proof of Lemma A4:.

First, using the assumption that $B$ is a bounded linear operator:

[TABLE]

Each term can be bounded above individually. Re-write the first term in terms of distribution: $\Big{|}\mathbb{E}\left(\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\beta_{0})\right)\Big{|}=\Big{|}\frac{1}{n}\sum_{t=1}^{n}\int e^{i\tau^{\prime}(\mathbf{y}_{t},\mathbf{x_{t}})}[f_{t}^{*}(\mathbf{y}_{t},\mathbf{x_{t}})-f_{t}(\mathbf{y}_{t},\mathbf{x_{t}})]d\mathbf{y}_{t}d\mathbf{x}_{t}\Big{|}$ , where $f_{t}$ is the distribution of $(\mathbf{y}_{t}(\beta_{0}),\mathbf{x}_{t})$ and $f_{t}$ the stationary distribution of $(\mathbf{y}_{t}(\beta_{0}),\mathbf{x}_{t})$ . Using the geometric ergodicity assumption, for all $\tau$ :

[TABLE]

for some $\rho\in(0,1)$ and $C_{\rho}>0$ . This yields a first bound:

[TABLE]

The mixture norm $\|\cdot\|_{m}$ is not needed here to bound the second term since it involves population CFs. Some changes to the proof of Lemma 2 allows to find bounds in terms of $\|\cdot\|_{\mathcal{B}}$ and $\|\cdot\|_{TV}$ for which Lemma A1 gives the approximation rates.

To bound the second term, re-write the simulated data as:

[TABLE]

with $\beta=(\theta,f)$ , $e_{t}^{s}\sim f$ and $\mathbf{x}_{t:1}=(x_{t},\dots,x_{1}),\mathbf{e}_{t:1}^{s}=(e_{t}^{s},\dots,e_{1}^{s})$ . Under Assumption 2 or 2*′*, using the same sequence of shocks $(e_{t}^{s})$ : $\mathbb{E}\left(\Big{\|}g_{obs,t}(\mathbf{x}_{t:1},\beta_{0},\mathbf{e}_{t:1}^{s})-g_{obs,t}(\mathbf{x}_{t:1},\Pi_{k(n)}\beta_{0},\mathbf{e}_{t:1}^{s})\Big{\|}\right)\leq\overline{C}\|\Pi_{k(n)}f_{0}-f_{0}\|^{\gamma}_{\mathcal{B}}.$ This is similar to the proof of Lemma 2, first re-write the difference as:

[TABLE]

Using Assumptions 2-2*′*, the following recursive relationship holds:

[TABLE]

The last term also has a recursive structure:

[TABLE]

Together these inequalities imply:

[TABLE]

Recall that $\|\tau\|_{\infty}\sqrt{\pi(\tau)}$ is bounded above and $\pi(\tau)^{1/4}$ is integrable so that:

[TABLE]

To conclude the proof, the difference due to $e_{t}^{s}$ needs to be bounded. In order to do so, it suffice to bound the following integral:

[TABLE]

A direct bound on this integral yields a term of order of $t\|f_{0}-\Pi_{k(n)}f_{0}\|_{TV}$ which increases with $t$ , which is too fast to generate useful rates. Rather than using a direct bound, consider Assumptions 2-2*′*. The time-series $y_{t}^{s}$ can be approximated by another time-series term which only depends on a fixed and finite $(e_{t}^{s},\dots,e_{t-m}^{s})$ for a given integer $m\geq 1$ . Making $m$ grow with $n$ at an appropriate rate allows to balance the bias $m\|f_{0}-\Pi_{k(n)}f_{0}\|_{TV}$ (computed from a direct bound) and the approximation due to $m<t$ .

The $m$ -approximation rate of $y_{t}$ is now derived. Let $\beta=(\theta,f)\in\mathcal{B}$ , $e_{t}^{s},\dots,e_{1}^{s}\sim f$ and $\tilde{y}_{t}^{s}$ such that $\tilde{y}_{t-m}^{s}=0,\tilde{u}_{t-m}^{s}=0$ and then $\tilde{y}_{j}^{s}=g_{obs}(\tilde{y}_{j-1}^{s},x_{j},\beta,\tilde{u}_{j}^{s}),\tilde{u}_{j}^{s}=g_{latent}(\tilde{u}_{j-1}^{s},\beta,e_{j}^{s})$ for $t-m+1\leq j\leq t$ . Each observation $t$ is approximated by its own time-series. For observation $t-m$ , by construction: $\mathbb{E}\left(\Big{\|}y_{t-m}^{s}-\tilde{y}_{t-m}^{s}\Big{\|}\right)=\mathbb{E}\left(\Big{\|}y_{t-m}^{s}\Big{\|}\right)\leq\left[\mathbb{E}\left(\Big{\|}y_{t-m}^{s}\Big{\|}^{2}\right)\right]^{1/2}$ and $\mathbb{E}\left(\Big{\|}u_{t-m}^{s}-\tilde{u}_{t-m}^{s}\Big{\|}\right)=\mathbb{E}\left(\Big{\|}u_{t-m}^{s}\Big{\|}\right)\leq\left[\mathbb{E}\left(\Big{\|}u_{t-m}^{s}\Big{\|}^{2}\right)\right]^{1/2}.$ Then, for any $t\geq\tilde{t}\geq t-m$ :

[TABLE]

The previous two results and a recursion arguments leads to the following inequality:

[TABLE]

For $\beta=\beta_{0},\Pi_{k(n)}\beta_{0}$ since the expectations are finite and bounded by assumption,

$\mathbb{E}\left(\Big{\|}y_{t}^{s}-\tilde{y}_{t}^{s}\Big{\|}\right)\leq\overline{C}\max(\overline{C}_{1},\overline{C}_{4})^{\gamma m}$ with $0\leq\max(\overline{C}_{1},\overline{C}_{4})<1$ and some $\overline{C}>0$ . For the first observations $t\leq m$ the data is unchanged, $y_{t}^{s}=\tilde{y}_{t}^{s}$ , so that the bound still holds. The integral can be split and bounded:

[TABLE]

The last inequality is due to the cosine and sine functions being uniformly Lipschitz continuous and equations (A.19)-(A.20). Recall that $\|\Pi_{k(n)}f_{0}-f_{0}\|_{TV}=O(\frac{\log[k(n)]^{2r/b}}{k(n)^{r}})$ . To balance the two terms, pick: $m=-\frac{r}{\gamma\log[\max(\overline{C}_{1},\overline{C}_{4})]}\log[k(n)]>0$ . Then $\max(\overline{C}_{1},\overline{C}_{4})^{\gamma m}=k(n)^{-r}$ and

[TABLE]

Combining all the bounds above yields:

[TABLE]

where $\|\cdot\|_{\mathcal{B}}=\|\cdot\|_{\infty}$ or $\|\cdot\|_{TV}$ so that $\|\beta_{0}-\Pi_{k(n)}\beta_{0}\|^{\gamma^{2}}_{\mathcal{B}}=O(\frac{\log[k(n)]^{4\gamma^{2}r/b}}{k(n)^{2\gamma^{2}r}})$ . The term due to the non-stationarity is of order $1/n^{2}=o\left(\max\left[\frac{\log[k(n)]^{4r/b+2}}{k(n)^{2r}},\frac{\log[k(n)]^{4\gamma^{2}r/b}}{k(n)^{2\gamma^{2}r}}\right]\right)$ so it can be ignored. This concludes the proof. ∎

Proof of Lemma A5:.

Using the inequality $1/2|a|^{2}\leq|a-b|^{2}+|b|^{2}$ for any $a,b\in\mathbb{R}$ :

[TABLE]

By assumption the term on the left is $O_{p}(\delta_{n}^{2})$ , by condition ii. the middle term is $O_{p}(\delta_{n}^{2})$ and condition i. implies that the term on the right is also $O_{p}(\delta_{n}^{2})$ . It follows that:

[TABLE]

Now note that both $\hat{\beta}_{n}$ and $\Pi_{k(n)}\beta_{0}$ belong to the finite dimensional space $\mathcal{B}_{k(n)}$ parameterized by $(\theta,\omega,\mu,\sigma)$ . To save space, $\hat{\beta}_{n}$ will be represented by $\hat{\varphi}_{n}=(\hat{\theta}_{n},\hat{\omega}_{n},\hat{\mu}_{n},\hat{\sigma}_{n})$ and $\Pi_{k(n)}\beta_{0}$ by $\varphi_{k(n)}=(\theta_{k(n)},\omega_{k(n)},\mu_{k(n)},\sigma_{k(n)})$ . Using this notation, equation (A.21) becomes:

[TABLE]

It follows that $0\leq\underline{\lambda}_{n}\|\hat{\beta}_{n}-\Pi_{k(n)}\beta_{0}\|_{m}^{2}\leq O_{p}(\delta_{n}^{2})$ so that the rate of convergence in mixture norm is: $\|\hat{\beta}_{n}-\Pi_{k(n)}\beta_{0}\|_{m}=O_{p}\left(\delta_{n}\underline{\lambda}_{n}^{-1/2}\right).$ ∎

Proof of Lemma A6.

Using the rate assumptions and Lemma B13 implies the desired result. ∎

Appendix B Intermediate Results

Lemma B7 (Kruijer, Rousseau and van der Vaart, 2010).

Suppose that $f$ is a continuous univariate density satisfying: i) Smoothness: $f$ is $r$ -times continuously differentiable with bounded $r$ -th derivative. ii) Tails: $f$ has exponential tails, i.e. there exists $\bar{e},M_{f_{1}},a,b>0$ such that: $f_{1}(e)\leq M_{f_{1}}e^{-a|e|^{b}},\,\forall|e|\geq\bar{e}.$ iii) Monotonicity in the Tails: $f$ is strictly positive and there exists $\underline{e}<\overline{e}$ such that $f_{S}$ is weakly decreasing on $(-\infty,\underline{e}]$ and weakly increasing on $[\overline{e},\infty)$ . Let $\mathcal{F}_{k}$ be the sieve space consisting of Gaussian mixtures with the following restrictions. iv) Bandwidth: $\sigma_{j}\geq\underline{\sigma}_{k}=O(\frac{\log[k(n)]^{2/b}}{k})$ . v) Location Parameter Bounds: $\mu_{j}\in[-\bar{\mu}_{k},\bar{\mu}_{k}]$ . vi) Growth Rate of Bounds: $\bar{\mu}_{k}=O\left(\log[k]^{1/b}\right)$ . Then there exists a mixture sieve approximation of $f$ , $\Pi_{k}f\in\mathcal{F}_{k}$ , such that as $k\to\infty$ : $\|f-\Pi_{k}f\|_{\mathcal{F}}=O\left(\frac{\log[k(n)]^{2r/b}}{k(n)^{r}}\right)$ , where $\|\cdot\|_{\mathcal{F}}=\|\cdot\|_{TV}$ or $\|\cdot\|_{\infty}$ .

Lemma B8 (Chen and Pouzo, 2012).

Let $\hat{\beta}_{n}$ be such that $\hat{Q}_{n}(\hat{\beta}_{n})\leq\inf_{\beta\in\mathcal{B}_{k(n)}}+O_{p^{*}}(\eta_{n})$ , where $(\eta_{n})_{n\geq 1}$ is a positive real-valued sequence such that $\eta_{n}=o(1)$ . Let $\bar{Q}_{n}:\mathcal{B}\rightarrow[0,+\infty)$ be a sequence of non-random measurable functions and let the following conditions hold: a. i) $0\leq\bar{Q}_{n}(\beta_{0})=o(1)$ ; ii) there is a positive function $g_{0}(n,k,\varepsilon)$ such that: $\inf_{h\in\mathcal{B}_{k}:\,\|\beta-\beta_{0}\|_{\mathcal{B}}>\varepsilon}\bar{Q}_{n}(\beta)\geq g_{0}(n,k,\varepsilon)>0\text{ for each }n,k\geq 1,$ and $\lim\inf_{n\to\infty}g_{0}(n,k(n),\varepsilon)\geq 0$ for all $\varepsilon>0$ . b. i) $\mathcal{B}$ is an infinite dimensional, possibly non-compact subset of a Banach space $(B,\|\|_{\mathcal{B}})$ ; ii) $\mathcal{B}_{k}\subseteq\mathcal{B}_{k+1}\subseteq\mathcal{B}$ for all $k\geq 1$ , and there is a sequence $\{\Pi_{k(n)}\beta_{0}\in\mathcal{B}_{k(n)}\}$ such that $\bar{Q}_{n}(\Pi_{k(n)}\beta_{0})=o(1)$ . c. $\hat{Q}_{n}(\beta)$ is jointly measurable in the data $(y_{t},x_{t})_{t\geq 1}$ and the parameter $h\in\mathcal{B}_{k(n)}$ . d. i) $\hat{Q}_{n}(\Pi_{k(n)}\beta_{0})\leq K_{0}\bar{Q}_{n}(\Pi_{k(n)}\beta_{0})+O_{p^{*}}(c_{0,n})$ for some $c_{0,n}=o(1)$ and a finite constant $K_{0}>0$ ; ii) $\hat{Q}_{n}(\beta)\geq K\bar{Q}_{n}(\beta)-O_{p^{*}}(c_{n})$ uniformly over $h\in\mathcal{B}_{k(n)}$ for some $c_{n}=o(1)$ and a finite constant $K>0$ ; iii) $\max(c_{0,n},c_{n},\bar{Q}_{n}(\Pi_{k(n)}\beta_{0}),\eta_{n})=o(g_{0}(n,k(n),\varepsilon))$ for all $\varepsilon>0$ . Then for all $\varepsilon>0$ : $\mathbb{P}^{*}\left(\|\hat{\beta}_{n}-\beta_{0}\|_{\mathcal{B}}>\varepsilon\right)\to 0\text{ as }n\to\infty.$

Lemma B9.

Let $(Y_{t})_{t\geq 1}$ mean zero, $\alpha$ -mixing with rate $\alpha(m)$ such that $\sum_{m\geq 1}\alpha(m)^{1/p}<\infty$ for some $p>1$ , and $|Y_{t}|\leq 1$ for all $t\geq 1$ . Then we have $\mathbb{E}\left(n|\bar{Y}_{n}|^{2}\right)\leq 1+24\sum_{m\geq 1}\alpha(m)^{1/p}$ .

Lemma B10.

Let $(X_{t})_{t>0}$ be a sequence of real-valued, centered random variables and $(\alpha_{m})_{m\geq 0}$ be the sequence of strong mixing coefficients. Suppose that $X_{t}$ is uniformly bounded and there exists $A,C>0$ such that $\alpha(m)\leq A\exp(-Cm)$ then there exists $K>0$ that depends only on the mixing coefficients such that for any $p\geq 2$ :

[TABLE]

where $Q_{t}$ is the quantile function of $X_{t}$ , $\min(\alpha^{-1}(u),n)=\sum_{i=k}^{n}\mathbbm{1}_{u\leq\alpha_{k}}$ .

Lemma B11.

Suppose that $(X_{t}(\beta))_{t>0}$ is a real valued, mean zero random process for any $\beta\in\mathcal{B}$ . Suppose that it is $\alpha$ -mixing with exponential decay: $\alpha(m)\leq A\exp(-Cm)$ for $A,C>0$ and bounded $|X_{t}(\beta)|\leq 1$ . Let $\mathcal{X}=\big{\{}X:\mathcal{B}\to\mathbb{C},\beta\to X_{t}(\beta)\big{\}}$ and suppose that $\int_{0}^{1}\log^{2}N_{[\,]}(x,\mathcal{X},\|\cdot\|)dx<\infty$ then: $\int_{0}^{1}x^{\vartheta/2-1}\sqrt{\log N_{[\,]}(x,\mathcal{X},\|\cdot\|)}+\log^{2}N_{[\,]}(x,\mathcal{X},\|\cdot\|)<\infty$ for all $\vartheta\in(0,1)$ and:

[TABLE]

Assumption 2′ (Data Generating Process - $L^{2}$ -Smoothness).**

$y_{t}^{s}$ * is simulated according to the dynamic model (1)-(2) where $g_{obs}$ and $g_{latent}$ satisfy the following $L^{2}$ -smoothness conditions for some $\gamma\in(0,1]$ and any $\delta\in(0,1)$ :*

$y(i)^{\prime}$ .

For some $0\leq\bar{C}_{1}<1$ :

$\big{[}\mathbb{E}\big{(}\sup_{\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}\leq\delta}\|g_{obs}(y_{t}^{s}(\beta_{1}),x_{t},\beta_{1},u_{t}^{s}(\beta_{1}))-g_{obs}(y_{t}^{s}(\beta_{2}),x_{t},\beta_{1},u_{t}^{s}(\beta_{1}))\|^{2}\big{|}y_{t}^{s}(\beta_{1}),y_{t}^{s}(\beta_{2})\big{)}\big{]}^{1/2}\leq\bar{C}_{1}\|y_{t}^{s}(\beta_{1})-y_{t}^{s}(\beta_{2})\|$ ** 2. $y(ii)^{\prime}$ .

For some $0\leq\bar{C}_{2}<\infty$ :

$\big{[}\mathbb{E}\big{(}\sup_{\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}\leq\delta}\|g_{obs}(y_{t}^{s}(\beta_{1}),x_{t},\beta_{1},u_{t}^{s}(\beta_{1}))-g_{obs}(y_{t}^{s}(\beta_{1}),x_{t},\beta_{2},u_{t}^{s}(\beta_{1}))\|^{2}\big{)}\big{]}^{1/2}\leq\bar{C}_{2}\delta^{\gamma}$ ** 3. $y(iii)^{\prime}$ .

For some $0\leq\bar{C}_{3}<\infty$ :

$\big{[}\mathbb{E}\big{(}\sup_{\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}\leq\delta}\|g_{obs}(y_{t}^{s}(\beta_{1}),x_{t},\beta_{1},u_{t}^{s}(\beta_{1}))-g_{obs}(y_{t}^{s}(\beta_{1}),x_{t},\beta_{1},u_{t}^{s}(\beta_{2}))\|^{2}\big{|}u_{t}^{s}(\beta_{1}),u_{t}^{s}(\beta_{2})\big{)}\big{]}^{1/2}\leq\bar{C}_{3}\|u_{t}^{s}(\beta_{1})-u_{t}^{s}(\beta_{2})\|^{\gamma}$ ** 4. $u(i)^{\prime}$ .

For some $0\leq\bar{C}_{4}<1$ :

$\big{[}\mathbb{E}\big{(}\sup_{\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}\leq\delta}\|g_{latent}(u_{t-1}^{s}(\beta_{1}),\beta,e_{t}^{s}(\beta_{1}))-g_{latent}(u_{t-1}^{s}(\beta_{2}),\beta,e_{t}^{s}(\beta_{1}))\|^{2}\big{)}\big{]}^{1/2}\leq\bar{C}_{4}\|u_{t-1}^{s}(\beta_{1})-u_{t-1}^{s}(\beta_{2})\|$ ** 5. $u(ii)^{\prime}$ .

For some $0\leq\bar{C}_{5}<\infty$ :

$\big{[}\mathbb{E}\big{(}\sup_{\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}\leq\delta}\|g_{latent}(u_{t-1}^{s}(\beta_{1}),\beta_{1},e_{t}^{s}(\beta_{1}))-g_{latent}(u_{t-1}^{s}(\beta_{1}),\beta_{2},e_{t}^{s}(\beta_{1}))\|^{2}\big{)}\big{]}^{1/2}\leq\bar{C}_{5}\delta^{\gamma}$ ** 6. $u(iii)^{\prime}$ .

For some $0\leq\bar{C}_{5}<\infty$ :

$\big{[}\mathbb{E}\big{(}\sup_{\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}\leq\delta}\|g_{latent}(u_{t-1}^{s}(\beta_{1}),\beta_{1},e_{t}^{s}(\beta_{1}))-g_{latent}(u_{t-1}^{s}(\beta_{1}),\beta_{1},e_{t}^{s}(\beta_{2}))\|^{2}\big{|}e_{t}^{s}(\beta_{1}),e_{t}^{s}(\beta_{2})\big{)}\big{]}^{1/2}\leq\bar{C}_{6}\|e_{t}^{s}(\beta_{1})-e_{t}^{s}(\beta_{2})\|$ **

for $\|\beta_{1}-\beta_{2}\|_{\mathcal{B}}=\|\theta_{1}-\theta_{2}\|+\|f_{1}-f_{2}\|_{\infty}$ or $\|\theta_{1}-\theta_{2}\|+\|f_{1}-f_{2}\|_{TV}$ .

Lemma B12.

Suppose that $(\mathbf{y}_{t}^{s},\mathbf{x}_{t})_{t\geq 1}$ is geometrically ergodic for $\beta=\beta_{0}$ and the moments are bounded $|\hat{\psi}_{t}^{s}(\tau,\beta_{0})|\leq M$ for all $\tau$ then $Q_{n}(\beta_{0})=O(1/n^{2}).$

Lemma B13 (Stochastic Equicontinuity).

Let $M_{n}=\log\log(n+1)$ and $\delta_{mn}=\delta_{n}/\sqrt{\underline{\lambda}_{n}}$ . Let $\Delta_{n}^{S}(\tau,\beta)=\hat{\psi}_{n}^{S}(\tau,\beta)-\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta))$ . Suppose that the assumptions of Lemma A5 and the conditions for Theorem 3 hold then for any $\eta>0$ , uniformly over $\beta\in\mathcal{B}_{k(n)}$ :

[TABLE]

Where $I_{m,n}$ is defined as:

[TABLE]

For the mixture sieve the integral is a $O(k(n)\log[k(n)]+k(n)|\log(M_{n}\delta_{mn})|)$ so that:

[TABLE]

Now suppose that $(M_{n}\delta_{mn})^{\frac{\gamma^{2}}{2}}\max(\log[k(n)]^{2},|\log[M_{n}\delta_{mn}]|^{2})k(n)^{2}=o(1)$ . The first stochastic equicontinuity result is:

[TABLE]

Also, suppose that $\beta\rightarrow\int\mathbb{E}\Big{|}\hat{\psi}_{t}^{s}(\tau,\beta_{0})-\hat{\psi}_{t}^{s}(\tau,\beta)\Big{|}^{2}\pi(\tau)d\tau$ is continuous at $\beta=\beta_{0}$ under the norm $\|\cdot\|_{\mathcal{B}}$ , uniformly in $t\geq 1$ . Then, the second stochastic equicontinuity result is:

[TABLE]

Lemma B14.

Suppose that $\|\hat{\beta}_{n}-\beta_{0}\|_{weak}=O_{p}(\delta_{n})$ . Under the Assumptions of Theorem 3:

a)

$\int\psi_{\beta}(\tau,u_{n}^{*})\left(\overline{B\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})-\hat{\psi}_{n}^{S}(\tau,\beta_{0}))-B\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))}{d\beta}[\hat{\beta}_{n}-\beta_{0}]}\right)\pi(\tau)d\tau=o(1/\sqrt{n}).$ ** 2. b)

$\int\psi_{\beta}(\tau,u_{n}^{*})\left(\overline{B\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})-\hat{\psi}_{n}^{S}(\tau,\beta_{0}))-B[\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})-\hat{\psi}_{n}^{S}(\tau,\beta_{0})]}\right)\pi(\tau)d\tau=o(1/\sqrt{n}).$ ** 3. c)

$\int\left[\psi_{\beta}(\tau,u_{n}^{*})\left(\overline{B[\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})]}\right)+\overline{\psi_{\beta}(\tau,u_{n}^{*})}\left(B[\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})]\right)\right]\pi(\tau)d\tau=o(1/\sqrt{n}).$ **

Appendix C Proofs for the Intermediate Results

Proof of Lemma B9:.

The proof follows from Davydov (1968)’s inequality: let $p,q,r\geq 0,1/p+1/q+1/r=1$ , for any random variables $X,Y$ : $|cov(X,Y)|\leq 12\alpha(\sigma(X),\sigma(Y))^{1/p}\mathbb{E}(|X|^{q})^{1/q}\mathbb{E}(|Y|^{r})^{1/r}$ , where $\alpha(\sigma(X),\sigma(Y))$ is the mixing coefficient between $X$ and $Y$ . As a result:

[TABLE]

∎

Proof of Lemma B10:.

Theorem 6.3 Rio (2000) implies the following inequality:

[TABLE]

where $a_{p}=p4^{p+1}(p+1)^{p/2}$ and $b_{p}=\frac{p}{p-1}4^{p+1}(p+1)^{p-1}$ , $Q=\sup_{t>0}Q_{t}$ and

$s_{n}^{2}=\sum_{t=1}^{n}\sum_{t^{\prime}=1}^{n}|\text{cov}(X_{t},X_{t^{\prime}})|$ . Since $X_{t}$ is uniformly bounded, using the results from Appendix C in Rio (2000): $\int_{0}^{1}\min(\alpha^{-1}(u),n)^{p-1}Q^{p}(u)du\leq 2\left[\sum_{k=0}^{n-1}(k+1)^{p-1}\alpha_{k}\right]\|\sup_{t>0}X_{t}\|_{\infty}.$ Because the strong-mixing coefficients are exponentially decreasing, it implies:

[TABLE]

And Corollary 1.1 of Rio (2000) yields: $s_{n}^{2}\leq 4\int_{0}^{1}\min(\alpha^{-1}(u),n)\sum_{t=1}^{n}Q_{k}^{2}(u)du.$ Altogether:

[TABLE]

with $K_{1}\geq 2^{1/p}p^{1/p}4^{(p+1)/p}$ , $K_{2}\geq(p/[p-1])^{1/p}4^{(p+1)/p}2^{1/p}A\exp(C)\frac{1}{(1-\exp(-C))^{(p-1)/p}}$ . Note that since $p\geq 2$ , $2^{1/p}\leq\sqrt{2},p^{1/p}\leq 1,4^{(p+1)/p}\leq 16$ , etc. The constants $K_{1},K_{2}$ do not depend on $p$ . $K$ only depends on the constants $A$ and $C$ . ∎

Proof of Lemma B11:.

Let $Z_{n}(\beta)=\frac{1}{\sqrt{n}}\sum_{t=1}^{n}X_{t}(\beta)$ , by Lemma B10:

[TABLE]

The term $\frac{1}{n}\sum_{t=1}^{n}\|X_{t}(\beta)\|^{\vartheta}$ comes from Hölder’s inequality, for any $\vartheta\in(0,1)$ :

[TABLE]

The last inequality follows from assuming $|Q_{t}|\leq 1$ . To simplify notation, use $\frac{1}{n}\sum_{t=1}^{n}\|Q_{t}\|_{1}^{\vartheta}$ rather than $\frac{1}{n}\sum_{t=1}^{n}\|Q_{t}\|_{1}^{\vartheta/2}$ . Also since $\alpha(j)$ has exponential decay, $\sum_{j=1}^{\infty}(1+j)^{1/(1-\vartheta)}\alpha(j)<\infty$ so the first term is a constant which only depends on $(\alpha(j))_{j}$ and $\vartheta$ . To derive the inequality, construct bracketing pairs $(\beta_{j}^{k},\Delta_{j}^{k})_{1\leq j\leq N(k)}$ with $N(k)=N_{[\,]}(2^{-k},\mathcal{X},\|\cdot\|_{2})$ the minimal number of brackets needed to cover $\mathcal{X}$ . By definition of $N(k)$ there exists brackets $(\Delta_{t,j}^{k})_{j=1,\dots,N(k)}$ such that: 1) $\mathbb{E}\left(|\Delta_{t,j}^{k}|^{2}\right)^{1/2}\leq 2^{-k}$ for all $t,j,k$ . 2) For all $\beta\in\mathcal{B}$ and $k\geq 1$ , there exists an index $j$ such that $|X_{t}(\beta)-X_{t}(\beta_{j}^{k})\leq\Delta_{t,j}^{k}$ . Note that brackets constructed the usual way need not be $\alpha$ -mixing, a construction which preserve the dependence properties is given at the end of the proof.

Assume that, without loss of generality, $|\Delta_{j}^{k}|\leq 1$ for all $j,k$ . Let $(\pi_{k}(\beta),\Delta_{k}(\beta))$ be a bracketing pair for $\beta\in\mathcal{B}$ . Let $q_{0},k,q$ be positive integers such that $q_{0}\leq k\leq q$ and let $T_{k}(\beta)=\pi_{k}\circ\pi_{k+1}\circ\dots\circ\pi_{q}(\beta)$ . Using the following identity:

[TABLE]

and the triangle inequality, decompose the identity into three groups:

[TABLE]

The following inequality is due to Pisier (1983), for any $X_{1},\dots,X_{N}$ random variables:

$\left[\mathbb{E}\left(\max_{1\leq t\leq N}|X_{t}|^{p}\right)\right]^{1/p}\leq N^{1/p}\max_{1\leq t\leq N}\left[\mathbb{E}\left(|X_{t}|^{p}\right)\right]^{1/p}.$ Now that $\{T_{k}(\beta),\beta\in\mathcal{B}\}$ has at most $N(k)$ elements by construction. Some terms can be simplified:

$E_{k}=\mathbb{E}\left(\max_{g\in T_{k}(\mathcal{B})}|Z_{n}(g)-Z_{n}(T_{k-1}(g))|^{2}\right)^{1/2}$ for $q_{0}+1\leq k\leq q.$ For $p\geq 2$ using both Hölder and Pisier’s inequalities:

[TABLE]

By the definition of $\Delta_{j}^{k}$ : $E_{k}\leq N(k)^{1/p}\max_{1\leq j\leq N(k)}\left[\mathbb{E}\left(|\Delta_{j}^{k}(g)|^{p}\right)\right]^{1/p}.$ This is also valid for $E_{q+1}$ . Using Rio’s inequality for $\alpha$ -mixing dependent processes:

[TABLE]

For $p>2$ and $2^{q}/\sqrt{n}\geq 1$ , the inequality becomes:

[TABLE]

Choosing $p=k+\log N(k)$ implies:

[TABLE]

Applying these bounds to the previous inequality:

[TABLE]

Note that $\sum_{k\geq 1}(\sqrt{k}+k^{2})2^{-k}\leq 2\sum_{k\geq 1}k^{2}2^{-k}=12.$ Hence:

[TABLE]

Pick the smallest integer $q$ such that $q\geq\log(n)/(2\log 2)-1$ so that $4\sqrt{n}\geq 2^{q}\geq\sqrt{n}/2$ and $2^{q}/\sqrt{n}\in[1/2,4]$ . Only $E_{q_{0}}$ remains to be bounded, using Rio’s inequality again:

[TABLE]

For any $\varepsilon>0$ pick $p=\max\left(2+\varepsilon,q_{0}+\log N(q_{0})\right)$ then: $N(q_{0})^{1/p}\leq\exp(1),\,n^{-1/2+1/p}\leq n^{-1/2+1/(2+\varepsilon)}\leq 1.$ Then conclude that:

[TABLE]

Hence, there exists a constant $K>0$ which only depends on $(\alpha(m))_{m>0}$ such that:

[TABLE]

Let $\sqrt{C_{n}}=K\int_{0}^{1}[x^{\vartheta/2-1}\sqrt{\log N_{[\,]}(x,\mathcal{X},\|\cdot\|)}+\log^{2}N_{[\,]}(x,\mathcal{X},\|\cdot\|)]dx$ , then $\mathbb{E}\left(\sup_{\beta\in\mathcal{B}}|Z_{n}(\beta)|^{2}\right)\leq C_{n}$ for all $n\geq 1$ .

Bracketing:

Because of the dynamics, the dependence of $X_{t}$ can vary with $\beta$ , which is not the case in Ben Hariz (2005) or Andrews and Pollard (1994). The following details the construction of the brackets $(\Delta_{t,j}^{k})$ in the current setting. Suppose that $\beta\rightarrow X_{t}(\beta)$ is $L^{p}$ -smooth. Let $\beta_{1}^{k},\dots,\beta_{N(k)}^{k}$ be such that $\mathcal{B}_{k_{n}}\subseteq\cup_{j=1}^{N(k)}B_{[\delta/C]^{\gamma}}(\beta_{j}^{k})$ then for $j\leq N(k)$ and some $Q\geq 2$ : $\left[\mathbb{E}\left(\sup_{\|\beta-\beta_{j}^{k}\|_{\mathcal{B}}\leq[\delta/C]^{\gamma}}|X_{t}(\beta)-X_{t}(\beta_{j}^{k})|^{Q}\right)\right]^{1/Q}\leq\delta.$ Let $\Delta_{t,j}^{k}=\sup_{\|\beta-\beta_{j}^{k}\|_{\mathcal{B}}\leq[\delta/C]^{\gamma}}|X_{t}(\beta)-X_{t}(\beta_{j}^{k})|$ then $\left[\mathbb{E}\left(\Delta_{t,j}^{2k}\right)\right]^{1/2}\leq\left[\mathbb{E}\left(\Delta_{t,j}^{Qk}\right)\right]^{1/Q}$ by Hölder’s inequality which is smaller than $\delta$ by construction. $\left[\mathbb{E}(|\Delta_{t,j}^{k}|^{2})\right]^{1/2}\leq\delta=2^{-k}$ by construction. However, there is no guarantee that $(\Delta_{t,j}^{k})_{t\geq 1}$ as constructed above is $\alpha$ -mixing. Another construction for the bracket which preserves the mixing property is now suggested. Let $B\subseteq\mathcal{B}$ a non-empty compact set in $\mathcal{B}$ . Note that since the $(\beta_{j}^{k})$ cover $\mathcal{B}$ , they also cover $B$ . Let $\tilde{\Delta}_{t,j}^{k}$ be such that $|\frac{1}{n}\sum_{t=1}^{n}\tilde{\Delta}_{t,j}^{k}|=\sup_{\beta\in B,\,\|\beta-\beta_{j}^{k}\|\leq[\delta/C]^{\gamma}}|\frac{1}{n}\sum_{t=1}^{n}X_{t}(\beta)-X_{t}(\beta_{j}^{k})|$ . Because $B$ is compact, the supremum is attained at some $\tilde{\beta}_{j}^{k}\in B$ . For all $t=1,\dots,n$ , take $\tilde{\Delta}_{t,j}^{k}=X_{t}(\tilde{\beta}_{j}^{k})-X_{t}(\beta_{j}^{k})$ . For each $(j,k)$ the sequence $(\tilde{\Delta}_{t,j}^{k})_{t\geq 0}$ is $\alpha$ -mixing by construction. Furthermore, by construction: $|\tilde{\Delta}_{t,j}^{k}|\leq|\Delta_{t,j}^{k}|$ and thus $\left[\mathbb{E}(|\tilde{\Delta}_{t,j}^{k}|^{Q})\right]^{1/Q}\leq 2^{-k}.$ These brackets, built in $B$ rather than $\mathcal{B}$ , preserve the mixing properties. The rest of the proof applied to $B$ implies:

[TABLE]

For an increasing sequence of compact sets $B_{k}\subseteq B_{k+1}\subseteq\mathcal{B}$ dense in $\mathcal{B}$ , there is an increasing and bounded sequence:

[TABLE]

This sequence is thus convergent with limit less or equal than the upper-bound. Hence, it must be that the supremum over $\mathcal{B}$ is also bounded. It can thus be assumed that $(\Delta_{t,j}^{k})_{t\geq 1}$ are $\alpha$ -mixing. ∎

Proof of Lemma B12:.

Since $(\mathbf{y}_{t}^{s},\mathbf{x}_{t})$ is geometrically ergodic, the joint density converges to the stationary distribution at a geometric rate: $\|f_{t}(y,x)-f^{*}_{t}(y,x)\|_{TV}\leq C\rho^{t}$ , $\rho<1$ . Because $B$ is bounded linear and the moments $\hat{\psi}_{n},\hat{\psi}_{n}^{s}$ are bounded above by $M$ , uniformly in $\tau$ :

[TABLE]

∎

Proof of Lemma B13.

Lemma B11 implies that for some $C>0$ :

[TABLE]

Next, apply the inequality of Lemma B11 to generate the bound:

[TABLE]

for some $\overline{C}>0,\vartheta\in(0,1)$ and

[TABLE]

Since $\int\sqrt{\pi(\tau)}d\tau<\infty$ , the term on the left-hand side of the inequality can be squared and multiplied by $\sqrt{\pi(\tau)}$ . Then, taking the integral:

[TABLE]

where $\overline{C}_{\pi}=\overline{C}\int\sqrt{\pi(\tau)}d\tau$ . Note that $J_{m,n}=O(k(n)^{2}\max(\log[k(n)]^{2},\log[M_{n}\delta_{m,n}]^{2}))$ .

To prove the final statement, notation will be shortened using $\Delta\hat{\psi}_{t}^{s}(\tau,\beta)=\hat{\psi}_{t}^{s}(\tau,\beta_{0})-\hat{\psi}_{t}^{s}(\tau,\beta)$ . Note that, by applying Davydov (1968)’s inequality:

[TABLE]

The last inequality is due to $|\Delta\hat{\psi}_{t}^{s}(\tau,\beta)|\leq 2$ . By the continuity assumption the last term is a $o(1)$ when $\|\beta_{0}-\Pi_{k(n)}\|_{\mathcal{B}}\to 0$ . As a result: $\int\mathbb{E}\Big{|}\Delta\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0})-\mathbb{E}[\Delta\hat{\psi}_{n}^{S}(\tau,\Pi_{k(n)}\beta_{0})]\Big{|}^{2}\pi(\tau)d\tau=o(1/n).$ To conclude the proof, apply a triangle inequality and the results above:

[TABLE]

∎

Proof of Lemma B14:.

Let $R_{n}(\beta,\beta_{0})=\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta)-\hat{\psi}_{n}^{S}(\tau,\beta_{0}))-\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))}{d\beta}[\beta-\beta_{0}]$ .

a)

Since $B$ bounded linear, the Cauchy-Schwarz inequality implies:

[TABLE]

By definition of $M_{n}$ and the inequality above:

[TABLE]

$\mathbb{P}\left(\|\hat{\beta}_{n}-\beta_{0}\|_{\mathcal{B}}>M_{n}\delta_{n}\right)\to 0$ regardless of $\varepsilon$ . Furthermore, Assumption 5 ii. implies:

[TABLE]

Assumption 5 i. implies that $(M_{n}\delta_{n})^{2}=o(\frac{1}{\sqrt{n}})$ , and thus: $\mathbb{P}\big{(}\Big{|}\int\psi_{\beta}(\tau,u_{n}^{*})\big{(}\overline{BR_{n}(\hat{\beta}_{n},\beta_{0})}\big{)}\pi(\tau)d\tau\Big{|}>\frac{\varepsilon}{\sqrt{n}}\big{)}=o(1)$ regardless of $\varepsilon>0$ . Hence: $\int\psi_{\beta}(\tau,u_{n}^{*})\big{(}\overline{BR_{n}(\hat{\beta}_{n},\beta_{0})}\big{)}\pi(\tau)d\tau=o_{p}(1/\sqrt{n}).$ 2. b)

Let $\Delta_{n}^{S}(\tau,\beta)=\hat{\psi}_{n}^{S}(\tau,\beta)-\mathbb{E}[\hat{\psi}_{n}^{S}(\tau,\beta)]$ . By the second stochastic equicontinuity result of Lemma B13 and the Cauchy-Schwarz inequality:

[TABLE]

where the last inequality holds with probability going to $1$ by definition of $M_{n}\delta_{mn}$ . 3. c)

Let $\varepsilon_{n}=\pm\frac{1}{\sqrt{n}M_{n}}=o(\frac{1}{\sqrt{n}})$ . For $h\in(0,1)$ define $\hat{\beta}(h)=\hat{\beta}_{n}+h\varepsilon_{n}u_{n}^{*}$ . Since $\hat{\beta}_{n}=\hat{\beta}(0)$ . Recall that $\hat{\beta}_{n}$ is the approximate minimizer of $\hat{Q}_{n}^{s}$ so that: $0\leq\hat{Q}_{n}^{S}(\hat{\beta}_{n})\leq\inf_{\beta\in\mathcal{B}_{k(n)}}\hat{Q}_{n}^{S}(\beta)+O_{p}(\eta_{n}).$ Hence the following holds:

[TABLE]

To prove Lemma B14 c), (C.23)-(C.24) are expanded individually and shown to be $o_{p}(1/\sqrt{n})$ and (C.25) is bounded, shown to be negligible under the assumptions.

The first step deals with (C.25):

[TABLE]

By the triangle inequality and the stochastic equicontinuity results from Lemma B13:

[TABLE]

Also, note that $\hat{\beta}(1)=\hat{\beta}(0)+\varepsilon_{n}u_{n}^{*}$ , so that the Mean Value Theorem applies to last term:

[TABLE]

for some intermediate value $\tilde{h}\in(0,1)$ . Also, by assumption: $\big{(}\int\Big{|}\frac{d\mathbb{E}[\hat{\psi}_{n}^{S}(\tau,\hat{\beta}(\tilde{h}))}{d\beta}[u_{n}^{*}]\Big{|}^{2}\pi(\tau)d\tau\big{)}^{1/2}=O_{p}(1).$ Together these two imply: $\big{(}\int\Big{|}\mathbb{E}[\hat{\psi}_{n}^{S}(\tau,\hat{\beta}(0))-\hat{\psi}_{n}^{S}(\tau,\hat{\beta}(1))]\Big{|}^{2}\pi(\tau)d\tau\big{)}^{1/2}=O(\varepsilon_{n}).$ This yields the bound for (C.25):

[TABLE]

The remaining terms, (C.23)-(C.24), are conjugates of each other. A bound for (C.23) is also valid for (C.24). Expanding (C.23) yields:

[TABLE]

Applying the Cauchy-Schwarz inequality to (C.26) implies:

[TABLE]

The term (C.28) can be bounded above using the triangle inequality:

[TABLE]

An application of Lemma B9 and the geometric ergodicity of $(\mathbf{y}_{t}^{s},\mathbf{x}_{t})$ yields:

$\left(\int\Big{|}\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{S}(\tau,\beta_{0})\Big{|}^{2}\pi(\tau)d\tau\right)^{1/2}=O_{p}(1/\sqrt{n}).$ Then, expanding the term in $\hat{\psi}_{n}^{s}$ :

[TABLE]

Note that Assumption 5 ii. implies that:

[TABLE]

By definition of the weak norm: $\left(\int\Big{|}B\frac{d\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\beta_{0}))}{d\beta}[\beta_{0}-\hat{\beta}(0)]\Big{|}^{2}\pi(\tau)d\tau\right)^{1/2}=\|\hat{\beta}_{n}-\beta_{0}\|_{weak}.$ Furthermore, $\|\hat{\beta}_{n}-\beta_{0}\|_{weak}=O_{p}(\delta_{n})$ by assumption. Overall, the following bound holds for (C.27): $\left(\int\Big{|}B\hat{\psi}_{n}(\tau)-B\hat{\psi}_{n}^{S}(\tau,\hat{\beta}(0))\Big{|}^{2}\pi(\tau)d\tau\right)^{1/2}\leq O_{p}\left(\frac{1}{\sqrt{n}}\right)+O_{p}\left(\delta_{n}\right)+O_{p}\left(\frac{(M_{n}\delta_{mn})^{\gamma^{2}/2}I_{m,n}}{\sqrt{n}}\right).$ Re-arranging (C.29) to apply the stochastic equicontinuity result again yields:

[TABLE]

Using the bounds for (C.27) and (C.29) yields the bound for (C.26):

[TABLE]

To bound (C.27), apply the Mean Value theorem up to the second order:

[TABLE]

Where the $O_{p}(\varepsilon_{n}^{2})$ term is due to the Cauchy-Schwarz inequality and Assumption 5 ii.:

[TABLE]

It was shown above that:

[TABLE]

Also, by Assumption 5 ii.: $\left(\int\Big{|}B\frac{d^{2}\mathbb{E}(\hat{\psi}_{n}^{S}(\tau,\hat{\beta}(\tilde{t})))}{d\beta d\beta}[u_{n}^{*},u_{n}^{*}]\Big{|}^{2}\pi(\tau)d\tau\right)=O_{p}(1).$

Finally, applying the Cauchy-Schwarz inequality to the last term of the expansion of (C.27) yields:

[TABLE]

Using inequality (C.22) together with the bounds above and the expansions of (C.23) and (C.24) yields:

[TABLE]

Since $\varepsilon_{n}=\pm\frac{1}{\sqrt{n}M_{n}}$ , dividing by $\varepsilon_{n}$ both keeps and flips the inequality so that:

[TABLE]

By construction, $\varepsilon_{n}=o_{p}(1/\sqrt{n})$ and Assumption 5 i. implies that $(M_{n}\delta_{mn})^{\gamma^{2}/2}I_{m,n}=o(1)$ so that all terms above are $o(1/\sqrt{n})$ . To conclude the proof, note that:

[TABLE]

∎

Appendix D Additional Results for the Applications

D.1 Verifying the Primitive Conditions in the First Application

Recall the data generating process used in Sections 4 and 5:

[TABLE]

The following verifies 1) the identification condition, that is for any $L\geq\underline{L}$ , to be determined, Assumption 1 ii holds if $f$ has sub-exponential tails, as required in Assumption 1 i, and 2) that Assumption 2 is satisfied. Geometric ergodicity can be verified by checking if Assumption 2.1 and the additional condition in Theorem 3.1 of Cline and Pu (1999) hold. Using their notation, $\alpha(\cdot)$ is linear and $\gamma(\cdot)$ is a product so the required conditions are verified.

Identification:

Assume $e_{1,t}\sim f$ with $\mathbb{E}(e_{1,t})=0,\mathbb{E}(e_{1,t}^{2})=1$ and $e_{2,t}\sim f_{2}$ a non-negative, known distribution with finite moment of order $p$ for any $p\geq 1$ , and $\mathbb{E}(e_{2,t})=\text{var}(e_{2,t})=1$ . Assume $\rho_{\sigma}\in[0,1)$ , $\mu_{\sigma}\geq 0$ and $\kappa_{\sigma}>0$ . For $L\geq 1$ , let $\mathbf{y}_{t}=(y_{t},\dots,y_{t-L})$ and $\psi(\tau,\theta,f)=\int\exp(i\tau^{\prime}\mathbf{y}_{t})f(\mathbf{y}_{t},\theta,f)d\mathbf{y}_{t}$ , note that $\partial_{\tau}\psi(0,\theta,f)=i\mathbb{E}(\mathbf{y}_{t})=i(\mu_{y},\dots,\mu_{y})$ so that $\mu_{y}$ is identified. Similar any joint moments of $\mathbf{y}_{t}$ can be recovered from the CF $\psi$ . It suffices to show that moments spanned by $\mathbf{y}_{t}$ can be used to identify $(\theta,f)$ . The coefficient $\rho_{y}$ is identified by the moment condition $\mathbb{E}([y_{t}-\mu_{y}-\rho_{y}(y_{t-1}-\mu_{y})]y_{t-2})=0$ . Take $\tilde{y}_{t}=y_{t}-\mu_{y}-\rho_{y}(y_{t-1}-\mu_{y})$ , we have: $\tilde{y}_{t}=\sigma_{t}[e_{t}+\vartheta_{y}e_{t-1}]$ .

Compute two more moments: $\mathbb{E}(\tilde{y}_{t}^{2})=\mathbb{E}(\sigma_{t}^{2})(1+\vartheta^{2})$ , and ${E}(\tilde{y}_{t}\tilde{y}_{t-1})=\vartheta\mathbb{E}(\sigma_{t}\sigma_{t-1}).$ Unlike the MA(1) with time-invariant volatility, these two moments alone are not sufficient to identify $\vartheta$ because $|\mathbb{E}(\sigma_{t}\sigma_{t-1})|\leq\mathbb{E}(\sigma_{t}^{2})$ , strictly with time-varying volatility.

Consider three additional moments: $\mathbb{E}(\tilde{y}_{t}^{2}\tilde{y}_{t-2}^{2})=\mathbb{E}(\sigma_{t}^{2}\sigma_{t-2}^{2})(1+\vartheta^{2})^{2}$ , $\mathbb{E}(\tilde{y}_{t}^{2}\tilde{y}_{t-4}^{2})=\mathbb{E}(\sigma_{t}^{2}\sigma_{t-4}^{2})(1+\vartheta^{2})^{2}$ , and $\mathbb{E}(\tilde{y}_{t}^{2}\tilde{y}_{t-2}^{2}\tilde{y}_{t-4}^{2})=\mathbb{E}(\sigma_{t}^{2}\sigma_{t-2}^{2}\sigma_{t-4}^{2})(1+\vartheta^{2})^{3},$ the main idea here is to lag twice each time to only measure dependence in $\sigma_{t}^{2}$ , lagging once would pick-up autocorrelations due to the MA(1) component. Let $\overline{\sigma}^{2}=\mathbb{E}(\sigma_{t}^{2})$ , we have: $\mathbb{E}(\sigma_{t}^{2})=\frac{\mu_{\sigma}+\kappa_{\sigma}}{1-\rho_{\sigma}}$ , $\mathbb{E}([\sigma_{t}^{2}-\bar{\sigma}^{2}][\sigma_{t-2}^{2}-\bar{\sigma}^{2}])=\rho_{\sigma}^{2}\frac{\kappa_{\sigma}^{2}\text{var}(u_{t})}{1-\rho_{\sigma}^{2}}$ , and $\mathbb{E}([\sigma_{t}^{2}-\bar{\sigma}^{2}][\sigma_{t-4}^{2}-\bar{\sigma}^{2}])=\rho_{\sigma}^{4}\frac{\kappa_{\sigma}^{2}\text{var}(u_{t})}{1-\rho_{\sigma}^{2}}.$ Taking a ratio, we can identify $\rho_{\sigma}\geq 0$ by assumption: $\frac{\mathbb{E}(\tilde{y}_{t}^{2}\tilde{y}_{t-2}^{2})-\mathbb{E}(\tilde{y}_{t}^{2})^{2}}{\mathbb{E}(\tilde{y}_{t}^{2}\tilde{y}_{t-4}^{2})-\mathbb{E}(\tilde{y}_{t}^{2})^{2}}=\frac{\mathbb{E}(\sigma_{t}^{2}\sigma_{t-2}^{2})-\mathbb{E}(\sigma_{t}^{2})^{2}}{\mathbb{E}(\sigma_{t}^{2}\sigma_{t-4}^{2})-\mathbb{E}(\sigma_{t}^{2})^{2}}=\rho_{\sigma}^{2}.$ We will assume $\rho_{\sigma}>0$ in the following. Similarly, using moments of $\tilde{y}_{t}$ can compute: $\frac{\mathbb{E}(\sigma_{t}^{2})^{2}}{\mathbb{E}(\sigma_{t}^{2}\sigma_{t-2}^{2})-\mathbb{E}(\sigma_{t}^{2})^{2}}=\frac{(\mu_{\sigma}+\kappa_{\sigma})^{2}}{\kappa_{\sigma}^{2}}\frac{1-\rho_{\sigma}^{2}}{(1-\rho_{\sigma})^{2}}\rho_{\sigma}^{2}\text{var}(e_{2,t}),$ since $f_{2}$ is known, this identifies the ratio $(\kappa_{\sigma}+\mu_{\sigma})/\kappa_{\sigma}$ since the indivial terms are non-negative. Now: $\mathbb{E}(\tilde{y}_{t}^{2})=\frac{\mu_{\sigma}+\kappa_{\sigma}}{\kappa_{\sigma}(1-\rho_{\sigma})}\kappa_{\sigma}(1+\vartheta^{2}),$ identifies the product $\kappa_{\sigma}(1+\vartheta^{2})$ . The moment $\mathbb{E}(\tilde{y}_{t}\tilde{y}_{t-1})$ does not have a closed-form expression but can be approximated by expanding $\sqrt{\sigma_{t}}$ around the mean $\overline{\sigma}^{2}=(\mu_{\sigma}+\kappa_{\sigma})/(1-\rho_{\sigma})$ : $\mathbb{E}(\sigma_{t}\sigma_{t-1})\simeq\mathbb{E}\left([\bar{\sigma}+\frac{1}{2\bar{\sigma}}(\sigma_{t}^{2}-\bar{\sigma}^{2})][\bar{\sigma}+\frac{1}{2\bar{\sigma}}(\sigma_{t-1}^{2}-\bar{\sigma}^{2})]\right)=\frac{1}{4\bar{\sigma}^{2}}\mathbb{E}([\sigma_{t}^{2}-\bar{\sigma}^{2}][\sigma_{t-1}^{2}-\bar{\sigma}^{2}]).$ The coefficients $\kappa_{\sigma},\vartheta$ are then separately identified using the system of equation: $\mathbb{E}(y_{t}y_{t-1})=\vartheta\frac{1}{4\bar{\sigma}^{2}}\mathbb{E}([\sigma_{t}^{2}-\bar{\sigma}^{2}][\sigma_{t-1}^{2}-\bar{\sigma}^{2}])$ , $\mathbb{E}(y_{t}^{2})=\bar{\sigma}^{2}(1+\vartheta^{2})$ , and $\mathbb{E}(y_{t}^{2}y_{t-2}^{2})-[\mathbb{E}(y_{t}^{2})]^{2}=\rho_{\sigma}(1+\vartheta^{2})\mathbb{E}([\sigma_{t}^{2}-\bar{\sigma}^{2}][\sigma_{t-1}^{2}-\bar{\sigma}^{2}])$ , using the same approach as for identifying the parameters of an MA(1) model with time-invariant volatility. This implies that $\underline{L}=5$ lags are sufficient to identify $\theta=(\mu_{y},\rho_{y},\vartheta_{y},\mu_{\sigma},\rho_{\sigma},\kappa_{\sigma})$ . If the unknown distribution $f$ has sub-exponential tails, then its moment generating function is analytic on some interval and the distribution is determined by its moments. The idea is to solve for the moments of $e_{1,t}$ recursively from moments of $y_{t}$ . We already assume that $\mathbb{E}(e_{1,t})=0,\mathbb{E}(e_{1,t}^{2})=1$ . The third moment $\mathbb{E}(\tilde{y}_{t}^{3})=\mathbb{E}(e_{1,t}^{3})\mathbb{E}(\sigma^{3})(1+\vartheta^{3})$ , where the last two terms can be computed from knowledge of $\theta$ . Using the Binomial Theorem: $\mathbb{E}(\tilde{y}_{t}^{k})=\mathbb{E}(\sigma_{t}^{k})\sum_{j=0}^{k}C_{k-j}^{j}\mathbb{E}(e_{1,t}^{k-j})\mathbb{E}(e_{1,t}^{j})\vartheta^{j}$ . With $k=3$ , this pins down the third moments, for $k=4$ the only unknown is the fourth moment, etc. Hence, once $\theta$ is known $(\mathbb{E}(\tilde{y}_{t}^{3}),\dots,\mathbb{E}(\tilde{y}_{t}^{k}))$ identifies $(\mathbb{E}(e_{1,t}^{3}),\dots,\mathbb{E}(e_{1,t}^{k}))$ for any $k\geq 3$ . Since $f$ is determined by its moments, it uniquely determines the distribution itself so that $(\theta,f)$ is jointly identified. With ergodicity, this implies $\lim_{n\to\infty}\mathbb{E}(\hat{\psi}_{n}(\tau)-\hat{\psi}_{n}^{s}(\tau,\beta))=0$ , $\forall\tau$ if, and only if, $\beta=\beta_{0}$ .

Data Generating Process:

Condition y(i): $\|g_{\text{obs}}(y_{1},\beta_{1},\sigma)-g_{\text{obs}}(y_{2},\beta_{1},\sigma)\|=|\rho_{y}|\|y_{1}-y_{2}\|\leq\bar{\rho}_{y}\|y_{1}-y_{2}\|$ , which implies the strict contraction property if $|\rho_{y}|\leq\bar{\rho}_{y}<1$ . For condition y(ii), $\|g_{\text{obs}}(y_{1},\mu_{1},\rho_{1},\vartheta_{1},\sigma)-g_{\text{obs}}(y_{1},\mu_{2},\rho_{2},\vartheta_{2},\sigma)\|\leq|\mu_{1}-\mu_{2}|+|\rho_{1}-\rho_{2}|\times|y_{1}|+\sigma|\vartheta_{1}-\vartheta_{2}|\times|e_{1}|$ which satisfies the desired bound if $|y_{t-1}|$ , $\sigma_{t}$ , and $|e_{t-1}|$ have bounded second moments. This is implied by restrictions on the parameters $\theta$ and the distribution $f$ . For condition y(iii), note that the $\sqrt{\cdot}$ function is Hölder continuous with exponent $1/2$ so that $\|g_{\text{obs}}(y_{1},\beta,\sigma_{1})-g_{\text{obs}}(y_{1},\beta_{1},\sigma)\|\leq|e_{t}+\vartheta e_{t-1}|\times\sqrt{|\sigma_{1}-\sigma_{2}|}$ , and $\mathbb{E}(|e_{t}+\vartheta e_{t-1}|^{2})\leq 3(1+\overline{\vartheta}^{2})$ if $|\vartheta|\leq\overline{\vartheta}$ and $\mathbb{E}(e_{t}^{2})=1$ . Hence, the assumptions on the DGP are satisfied.

D.2 Additional Results for the Second Application

Table D4 below reports estimates for $1/\tau,1/\gamma$ instead of $\tau,\gamma$ in Table 2. CIs are reported for $\tau$ , $\gamma$ by transforming $[1/\hat{\tau}_{n}\pm 1.96\text{se}(1/\hat{\tau}_{n})]$ .

Appendix E Additional Results

E.1 Convergence rate in the MA(1) model

The following derives the rate of convergence for the MA(1) process: $y_{t}=e_{t}+\vartheta e_{t-1}$ , $e_{t}\overset{iid}{\sim}f$ , first when $S=+\infty$ . Here $\beta=(\vartheta,f)\in[-1,1]\times\mathcal{F}$ . Take $L\geq 1$ , then the joint distribution $\mathbf{y}_{t}=(y_{t},y_{t-1})$ uniquely identifies $\beta$ . Let $h(\tau,e,\vartheta)=e^{i\tau_{1}e_{1}+i\vartheta\tau_{2}e_{2}+i\tau_{2}e_{2}+i\vartheta\tau_{2}e_{3}}$ . The CF of $\mathbf{y}_{t}$ is: $\psi(\tau;\beta)=\int h(\tau,e,\vartheta)f(e_{1})f(e_{2})f(e_{3})de_{1}de_{2}de_{3},$ for $L=1$ where $\tau=(\tau_{1},\tau_{2})$ . Let $\beta_{k}=(\vartheta_{0},f_{k})$ and $\hat{\beta}_{n}$ be an exact minimizer of $Q_{n}$ , then by triangular inequalities in $\mathbb{L}^{2}(\pi)$ :

[TABLE]

The last term is $O_{p}(n^{-1/2})$ plus $(\int|\psi(\tau;\beta_{k})-\psi(\tau;\beta_{0})|^{2}\pi(\tau)d\tau)^{1/2}\leq(L+1)\|f_{k}-f_{0}\|_{TV}$ because the exponential has modulus $1$ and the density $f$ appears $L+1$ times in the CF. This is related to the bias accumulation discussed in the main text. From this we deduce the convergence rate under the distance implied by the CF:

[TABLE]

which is a $O_{p}(\max[n^{-1/2},\log[k]^{2r/b}k^{-r}])$ , since $\|f_{k}-f_{0}\|_{TV}=O(\log[k]^{2r/b}k^{-r})$ under the smoothness and tails assumptions. Because here $S=+\infty$ , we can use $k\log[k]^{-2/b}\asymp n^{-1/2r}$ which gives: $\int|\psi(\tau;\hat{\beta}_{n})-\psi(\tau;\beta_{0})|^{2}\pi(\tau)d\tau\Big{)}^{1/2}=O_{p}(n^{-1/2}),$ in line with Corollary 1. For $r=2$ , this implies $k\asymp n^{-1/4}$ , up to log-terms. Asymptotically, $(\int|\psi(\tau;\hat{\beta}_{n})-\psi(\tau;\beta_{0})|^{2}\pi(\tau)d\tau)^{1/2}\asymp\|\hat{\beta}_{n}-\beta_{0}\|_{\text{weak}}$ which implies the convergence rate in weak norm. It involves the derivative $\psi_{\beta}(\tau,f)[v]$ , i.e. $\psi_{f}(\tau,\beta)[v]=\int h(\tau,e,\vartheta)\{v(e_{1})f(e_{2})f(e_{3})+f(e_{1})v(e_{2})f(e_{3})+f(e_{1})f(e_{2})v(e_{3})\}de_{1}de_{2}de_{3}$ and $\psi_{\vartheta}(\tau,\beta)=\int[\tau_{1}e_{2}+\tau_{2}e_{3}]h(\tau,e,\vartheta)f(e_{1})f(e_{2})f(e_{3})de_{1}de_{2}de_{3}$ , for $L=1$ . The local measure of ill-posedness $\tau_{n}$ is not closed-form, making the rate in stronger norm intractable. For $S<+\infty$ , the term $\sup_{\beta\in\mathcal{B}_{k(n)}}(\int|\psi(\tau;\beta)-\hat{\psi}_{n}^{S}(\tau;\hat{\beta}_{n})|^{2}\pi(\tau)d\tau)^{1/2}=O_{p}([k(n)\log[k(n)]]^{2}/\sqrt{nS})$ also affects the rate of convergence. Here geometric ergodicity automatically holds; an MA(1) being m-dependent regardless of the MA coefficient.

E.2 Sieve Long-Run Variance

The following derives the formula for the sieve long-run variance $\sigma_{n}^{\star 2}$ . For brevity of notation, let $Z_{t}(\tau)=\hat{\psi}_{t}^{S}(\tau,\beta_{0})-\hat{\psi}_{t}(\tau)$ and $Z_{n}(\tau)=\frac{1}{n}\sum_{t}Z_{t}(\tau)$ . Let: $S_{t}^{\star}=\frac{1}{2}\int\{\psi_{\beta}(\tau,v_{n}^{\star})\overline{Z_{t}(\tau)}+\overline{\psi_{\beta}(\tau,v_{n}^{\star})}Z_{t}(\tau)\}\pi(\tau)d\tau$ , the sieve score is $S_{n}^{\star}=\frac{1}{n}\sum_{t}S_{t}^{\star}$ , and the sieve long-run variance is: $\sigma_{n}^{\star 2}=n\mathbb{E}(S_{n}^{\star 2})=\mathbb{E}(S_{t}^{\star 2})+2\sum_{j=1}^{n-1}\frac{n-j}{n}\mathbb{E}(S_{t}^{\star}S_{t-j}^{\star})$ . For any $j\geq 0$ , we have:

[TABLE]

Let $K_{j}:\mathbb{L}^{2}(\pi)\to\mathbb{L}^{2}(\pi)$ be a linear operator such that: $K_{j}f(\tau_{1})=\frac{1}{2}\int\Big{\{}\mathbb{E}[\overline{Z_{t}(\tau_{1})}\overline{Z_{t-j}(\tau_{2})}]f(\tau_{2})+\mathbb{E}[\overline{Z_{t}(\tau_{1})}Z_{t-j}(\tau_{2})]\overline{f(\tau_{2})}\Big{\}}\pi(\tau_{2})d\tau_{2},$ with the associated inner-product in $\mathbb{L}^{2}(\pi)$ : $\langle f_{1},f_{2}\rangle_{\pi}=\frac{1}{2}\int\{f_{1}(\tau)\overline{f_{2}(\tau)}+\overline{f_{1}(\tau)}f_{2}(\tau)\}\pi(\tau)d\tau$ .111 Notice that $\langle v_{1},v_{2}\rangle=1/2\int\{\psi_{\beta}(\tau,v_{1})\overline{\psi_{\beta}(\tau,v_{2})}\}+\overline{\psi_{\beta}(\tau,v_{1})}\psi_{\beta}(\tau,v_{2})\}\pi(\tau)d\tau$ is also $\langle\psi_{\beta}(\cdot,v_{1}),\psi_{\beta}(\cdot,v_{2})\rangle_{\pi}$ . Compactly re-write the autocovariance: $\mathbb{E}(S_{t}^{\star}S_{t-j}^{\star})=\langle\psi_{\beta}(\cdot,v_{n}^{\star}),K_{j}\psi_{\beta}(\cdot,v_{n}^{\star})\rangle_{\pi}.$ Then, by linearity: $\sigma_{n}^{\star 2}=\langle\psi_{\beta}(\cdot,v_{n}^{\star}),K_{n}\psi_{\beta}(\cdot,v_{n}^{\star})\rangle_{\pi}$ , where $K_{n}=K_{0}+2\sum_{j=1}^{n-1}\frac{n-j}{n}K_{j}$ is the long-run variance operator. Their sample counterparts are: $\hat{\psi}_{\beta}(\tau,v)=d_{\beta}\hat{\psi}_{n}^{S}(\tau,\hat{\beta}_{n})[v]$ , $\langle v_{1},v_{2}\rangle_{n}=\frac{1}{2}\int\{\hat{\psi}_{\beta}(\tau,v_{1})\overline{\hat{\psi}_{\beta}(\tau,v_{2})}+\overline{\hat{\psi}_{\beta}(\tau,v_{1})}\hat{\psi}_{\beta}(\tau,v_{2})\}\pi(\tau)d\tau$ , $\hat{v}_{n}^{\star}$ such that $\langle\hat{v}_{n}^{\star},v\rangle_{n}=d_{\beta}\phi(\hat{\beta}_{n})[v]$ for any $v$ . Let $\hat{Z}_{t}(\tau)=\hat{\psi}_{t}^{S}(\tau,\hat{\beta}_{n})-\hat{\psi}_{t}(\tau)$ , $\hat{S}_{t}^{\star}=\frac{1}{2}\int\{\hat{\psi}_{\beta}(\tau,\hat{v}_{n}^{\star})\overline{\hat{Z}_{t}(\tau)}+\overline{\hat{\psi}_{\beta}(\tau,\hat{v}_{n}^{\star})}\hat{Z}_{t}(\tau)\}\pi(\tau)d\tau$ , and $\hat{S}_{n}^{\star}=\frac{1}{n}\sum_{t}\hat{S}_{t}^{\star}$ . Using an estimate $\hat{K}_{n}$ of $K_{n}$ , we have: $\|\hat{v}_{n,sd}^{\star}\|^{2}=\hat{\sigma}_{n}^{\star 2}=\langle\hat{\psi}_{\beta}(\cdot,\hat{v}_{n}^{\star}),\hat{K}_{n}\hat{\psi}_{\beta}(\cdot,\hat{v}_{n}^{\star})\rangle_{\pi}=\langle\hat{v}_{n}^{\star},\hat{v}_{n}^{\star}\rangle_{n,\hat{K}_{n}}$ . Now, to estimate the long-run variance operator $K_{n}$ , take $j\geq 0$ and let $\hat{K}_{j}$ be such that: $\hat{K}_{j}f(\tau_{1})=\frac{1}{2}\int\Big{\{}\frac{1}{n}\big{[}\sum_{t=j+1}^{n}\overline{\hat{Z}_{t}(\tau_{1})}\overline{\hat{Z}_{t-j}(\tau_{2})}\big{]}f(\tau_{2})+\frac{1}{n}\big{[}\sum_{t=j+1}^{n}\overline{\hat{Z}_{t}(\tau_{1})}\hat{Z}_{t-j}(\tau_{2})\big{]}\overline{f(\tau_{2})}\Big{\}}\pi(\tau_{2})d\tau_{2}$ ; then $\hat{K}_{n}=\hat{K}_{0}+2\sum_{j=1}^{n-1}\omega(j/T_{n})\hat{K}_{j}$ , where $\omega$ and $T_{n}$ are the HAC kernel and bandwidth.

Assumption E6.

Suppose i. $\sup_{\beta\in\mathcal{N}_{osn}}\sup_{v\in\overline{V}^{1}_{k(n)}}|d_{\beta}\phi(\beta)[v]-d_{\beta}\phi(\beta_{0})[v]|=o(1)$ , ii. for each $k(n)$ , any $\beta\in\mathcal{N}_{osn}$ , and any $v\in\overline{V}^{1}_{k(n)}$ , $\hat{\psi}_{\beta}(\cdot,v)\in\mathbb{L}^{2}(\pi)$ , $\sup_{v_{1},v_{2}\in\overline{V}^{1}_{k(n)}}|\langle v_{1},v_{2}\rangle_{n}-\langle v_{1},v_{2}\rangle|=o_{p}(1)$ , iii. $\sup_{v\in\overline{V}_{k(n)}^{1}}|\langle v,v\rangle_{n,K_{n}}-\langle v,v\rangle_{K_{n}}|=o_{p}(1)$ , iv. $\|\hat{K}_{n}-K_{n}\|_{op}=o_{p}(1)$ .

Where $\|\cdot\|_{op}$ is the operator norm in $(\mathbb{L}^{2}(\pi),\langle\cdot,\cdot\rangle_{\pi})$ . Assumption E6 i-iii is based on Assumption 4.1 in Chen and Pouzo (2015a). Given Assumption 1 iii, Proposition 3.3 in Carrasco et al. (2007) imply Assumption E6 iv holds under Assumption E7 below.

Assumption E7.

Suppose i. $\omega:\mathbb{R}\to[0,1]$ , $\omega(0)=1$ , $\omega(-x)=\omega(x)$ , $\forall x\in\mathbb{R}$ , $\omega\in\mathbb{L}^{2}(\mathbb{R})$ , $\omega$ is continuous at [math] and all, but finitely many, values of $x$ ; ii. $T_{n}^{2\nu+1}/n\to\gamma\in(0,\infty)$ for some $\nu$ for which $\|\omega^{\nu}\|<\infty$ and $\|f_{Y}^{\nu}\|<\infty$ , $\omega^{\nu}$ and $f_{Z}^{\nu}$ are the $\nu$ -th derivative of $\omega$ and $f_{Y}$ , the spectral density of $(\mathbf{y}_{t},\mathbf{y}_{t}^{s})$ at [math].

Proposition E1.

Suppose Assumption E6 holds, then $\big{|}\hat{\sigma}_{n}^{\star}/\sigma_{n}^{\star}-1\big{|}=o_{p}(1)$ .

Proposition E1 follows from Theorem 4.2 in Chen and Pouzo (2015a), where now Step 2A in their proof (Chen and Pouzo, 2015b, p9) requires $\|\hat{K}_{n}-K_{n}\|_{op}=o_{p}(1)$ as in Assumption E6 iv. The formula used in the main text is easier to implement, but equivalent. For each $j\geq 0$ : $\int\text{real}\{\psi_{\beta}(\tau_{1},v_{n}^{\star})\mathbb{E}[\overline{Z_{t}(\tau_{1})}\text{real}[Z_{t-j}(\tau_{2})\overline{\psi_{\beta}(\tau_{2},v_{n}^{\star})}]]\}\pi(\tau_{1})\pi(\tau_{2})d\tau_{1}d\tau_{2}=\langle\psi_{\beta}(\cdot,v_{n}^{\star}),K_{j}\psi_{\beta}(\cdot,v_{n}^{\star})\rangle_{\pi}$ . Because $\mathbb{E}$ , $\int$ and real are linear operators, they arrange into:

[TABLE]

Then replace $\text{real}[\psi_{\beta}(\tau_{1},v_{n}^{\star})\overline{Z_{t}(\tau_{1})}]=\text{real}[\psi_{\beta}(\tau_{1},v_{n}^{\star})]\text{real}[Z_{t}(\tau_{1})]+\text{im}[\psi_{\beta}(\tau_{1},v_{n}^{\star})]\text{im}[Z_{t}(\tau_{1})]$ . Next, let $\varphi=(\theta,\omega,\mu,\sigma)$ denote the parameter $\beta$ in the sieve basis. For any $v$ , $v^{\prime}d_{\varphi}\phi(\beta_{0})=\langle v,v_{n}^{\star}\rangle=v^{\prime}\text{real}[\int\psi_{\varphi^{\prime}}(\tau,\beta_{0})\overline{\psi_{\varphi^{\prime}}(\tau,\beta_{0})}\pi(\tau)d\tau]v_{n}^{\star}$ so $v_{n}^{\star}=\text{real}[\int\psi_{\varphi^{\prime}}(\tau,\beta_{0})\overline{\psi_{\varphi^{\prime}}(\tau,\beta_{0})}\pi(\tau)d\tau]^{-1}d_{\varphi}\phi(\beta_{0})$ . Now, substitude $v_{n}^{\star}$ into $\langle\psi_{\beta}(\cdot,v_{n}^{\star}),K_{n}\psi_{\beta}(\cdot,v_{n}^{\star})\rangle_{\pi}$ to get the sandwich formula. The same derivations applied to the sample quantities yield the formula in the main text.

Appendix F Additional Monte-Carlo Results

F.1 Main Examples: $n=100$

F.2 Sensitivity to the Estimation Inputs

F.3 Asset Pricing in a Stationnary Production Economy

Model:

The model is a simplified version of Ruge-Murcia (2017) with CRRA preferences. Log-productivity and inflation follow AR(1) processes instead of a VAR(1). The main equations are summarized below. Utility is CRRA, using the notation of the empirical application: $U_{t}=\frac{C_{t}^{1-\gamma}}{1-\gamma}+\beta\mathbb{E}_{t}(U_{t+1}).$ Productivity $Z_{t}$ evolves according to: $\log Z_{t+1}=\rho_{z}\log Z_{t}+e_{1,t+1},$ where $e_{1,t+1}\overset{iid}{\sim}f_{1}$ has mean zero. Inflation $\pi_{t+1}=P_{t+1}/P_{t}$ evolves according to: $\log\pi_{t+1}=\log\bar{\pi}+\rho_{\pi}(\log\pi_{t}-\log\bar{\pi})+e_{2,t+1},$ where $e_{2,t+1}\overset{iid}{\sim}f_{2}$ has mean zero. Production is Cobb-Douglas $Y_{t}=Z_{t}K_{t}^{\alpha}$ , capital evolves according to $K_{t+1}=(1-\delta)K_{t}+I_{t}$ with quadratic adjustment costs $\Psi_{t}=\frac{\psi}{2}(I_{t}/K_{t}-\delta)^{2}K_{t}$ . The bond pricing equation is: $Q_{t}^{\ell}=\beta\mathbb{E}_{t}\left[\left(\frac{C_{t+1}}{C_{t}}\right)^{-\gamma}\frac{Q_{t+1}^{\ell-1}}{\pi_{t+1}}\right],$ where $Q_{t}^{0}=1$ . Only the $3$ m yield is computed. Two parameters $(\alpha,\delta)=(0.35,0.025)$ are calibrated as in Ruge-Murcia (2017). Consumption and investment growth are de-meaned to remove the effect of calibration on the levels. Estimation is performed as in the empirical application but with $S=1$ . For reference, estimates with parametric skewed-logistic (skl) shocks are also reported. The settings for the optimizer are the same for all three estimations. The average time is 20mn for the parametric estimates, 31mn for $k=3$ and 49mn for $k=5$ in a 12-core cluster environment. Table F8 reports average, median estimates and their standard deviations. skl estimates for $\gamma$ and $\phi$ are very skewed upwards, mixtures estimates appear to be more robust and closer to their large sample approximation. Figure F6 reports the average and interquantile range of the density estimates $\hat{f}_{1,n},\hat{f}_{2,n}$ .

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ai and Chen (2003) Ai, C. and X. Chen (2003): “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions,” Econometrica , 71, 1795–1843.
2Ai and Chen (2007) ——— (2007): “Estimation of possibly misspecified semiparametric conditional moment restriction models with different conditioning variables,” Journal of Econometrics , 141, 5 – 43, semiparametric methods in econometrics.
3Andrews and Pollard (1994) Andrews, D. W. K. and D. Pollard (1994): “An Introduction to Functional Central Limit Theorems for Dependent Stochastic Processes,” International Statistical Review / Revue Internationale de Statistique , 62, 119.
4Backus et al. (2011) Backus, D., M. Chernov, and I. Martin (2011): “Disasters implied by equity index options,” The journal of finance , 66, 1969–2012.
5Bansal and Yaron (2004) Bansal, R. and A. Yaron (2004): “Risks for the Long Run: A Potential Resolution of Asset Pricing Puzzles,” The Journal of Finance , 59, 1481–1509.
6Ben Hariz (2005) Ben Hariz, S. (2005): “Uniform CLT for empirical process,” Stochastic Processes and their Applications , 115, 339–358.
7Bierens and Song (2012) Bierens, H. J. and H. Song (2012): “Semi-nonparametric estimation of independently and identically repeated first-price auctions via an integrated simulated moments method,” Journal of Econometrics , 168, 108–119.
8Blasques (2011) Blasques, F. (2011): “Semi-Nonparametric Indirect Inference,” Ph D Thesis, Maastricht University , 1–221.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Sieve-SMM Estimator for Dynamic Models

Abstract

1 Introduction

Related Literature

Notation

Structure of the Paper

2 A Sieve-SMM Estimator

Inputs for the mixture:

Inputs for Q^nS\hat{Q}_{n}^{S}Q^​nS​:

Choice of optimizer:

Modelling fat tails:

3 Asymptotic Properties

3.1 Consistency

Assumption 1** (Sieve, Identification, Dependence).**

Lemma 1** (L2L^{2}L2-Smoothness of the Mixture Draws).**

Assumption 2** (Data Generating Process).**

Lemma 2** (Assumption 2/2*′* implies L2L^{2}L2-Smoothness of the Moments).**

Theorem 1** (Consistency).**

3.2 Rate of Convergence

Assumption 3** (Weak Norm, Local Properties).**

Theorem 2** (Rate of Convergence).**

Corollary 1** (Number of Simulated Samples SSS and Rate of Convergence).**

3.3 Asymptotic Normality

Definition 1** (Sieve Representer, Score and Variance).**

Assumption 4** (Equivalence Condition).**

Assumption 5** (Convergence Rate, Smoothness, Bias).**

Theorem 3** (Asymptotic Normality).**

4 Monte-Carlo Illustrations

5 Applications to Asset Pricing

5.1 Non-Gaussian Shocks and Long-Run Uncertainty

5.2 Bond Pricing in a Production Economy

Model:

Solution method:

Data:

Estimation results:

6 Conclusion

Appendix A Preliminary Results

Lemma A1** (Approximation Properties of the Gaussian and Tails Mixture).**

Lemma A2** (Properties of the Tails Distributions).**

Lemma A3** (Covering Numbers).**

Lemma A4** (Nonparametric Approximation Bias).**

Lemma A5** (Convergence Rate in ∥⋅∥m\|\cdot\|_{m}∥⋅∥m​).**

Lemma A6** (Stochastic Equicontinuity).**

Appendix B Proofs for the Main Results

B.1 Consistency

Proof of Lemma 1.

Proof of Lemma 2:.

Proof of Theorem 1:.

B.2 Rate of Convergence

Proof of Theorem 2:.

Proof of Corollary 1:.

B.3 Asymptotic Normality

Proof of Theorem 3:.

Appendix A Proofs for the Preliminary Results

Proof of Lemma A1.

Proof of Lemma A2.

Proof of Lemma A3:.

Proof of Lemma A4:.

Proof of Lemma A5:.

Proof of Lemma A6.

Appendix B Intermediate Results

Lemma B7** (Kruijer, Rousseau and van der Vaart, 2010).**

Lemma B8** (Chen and Pouzo, 2012).**

Lemma B9**.**

Lemma B10**.**

Lemma B11**.**

Assumption 2′ (Data Generating Process - L2L^{2}L2-Smoothness).**

Lemma B12**.**

Lemma B13** (Stochastic Equicontinuity).**

Lemma B14**.**

Appendix C Proofs for the Intermediate Results

Proof of Lemma B9:.

Proof of Lemma B10:.

Proof of Lemma B11:.

Inputs for $\hat{Q}_{n}^{S}$ :

Assumption 1 (Sieve, Identification, Dependence).

Lemma 1 ( $L^{2}$ -Smoothness of the Mixture Draws).

Assumption 2 (Data Generating Process).

Lemma 2 (Assumption 2/2′ implies $L^{2}$ -Smoothness of the Moments).

Theorem 1 (Consistency).

Assumption 3 (Weak Norm, Local Properties).

Theorem 2 (Rate of Convergence).

Corollary 1 (Number of Simulated Samples $S$ and Rate of Convergence).

Definition 1 (Sieve Representer, Score and Variance).

Assumption 4 (Equivalence Condition).

Assumption 5 (Convergence Rate, Smoothness, Bias).

Theorem 3 (Asymptotic Normality).

Lemma A1 (Approximation Properties of the Gaussian and Tails Mixture).

Lemma A2 (Properties of the Tails Distributions).

Lemma A3 (Covering Numbers).

Lemma A4 (Nonparametric Approximation Bias).

Lemma A5 (Convergence Rate in $\|\cdot\|_{m}$ ).

Lemma A6 (Stochastic Equicontinuity).

Lemma B7 (Kruijer, Rousseau and van der Vaart, 2010).

Lemma B8 (Chen and Pouzo, 2012).

Lemma B9.

Lemma B10.

Lemma B11.

Assumption 2′ (Data Generating Process - $L^{2}$ -Smoothness).**

Lemma B12.

Lemma B13 (Stochastic Equicontinuity).

Lemma B14.

Assumption E6.

Assumption E7.

Proposition E1.

F.1 Main Examples: $n=100$