Indirect Inference for Locally Stationary Models

David Frazier; Bonsoo Koo

arXiv:1906.01768·econ.EM·December 17, 2020

Indirect Inference for Locally Stationary Models

David Frazier, Bonsoo Koo

PDF

Open Access

TL;DR

This paper introduces a novel indirect inference approach for complex locally stationary models, enabling inference with nonparametric convergence rates and validated through simulations and financial data analysis.

Contribution

It develops a local indirect inference algorithm for locally stationary models and establishes their asymptotic properties, addressing nonparametric challenges.

Findings

01

Validated methodology with simulation studies

02

Detected non-linear, time-varying volatility in financial data

03

Established asymptotic properties of the estimator

Abstract

We propose the use of indirect inference estimation to conduct inference in complex locally stationary models. We develop a local indirect inference algorithm and establish the asymptotic properties of the proposed estimator. Due to the nonparametric nature of locally stationary models, the resulting indirect inference estimator exhibits nonparametric rates of convergence. We validate our methodology with simulation studies in the confines of a locally stationary moving average model and a new locally stationary multiplicative stochastic volatility model. Using this indirect inference methodology and the new locally stationary volatility model, we obtain evidence of non-linear, time-varying volatility trends for monthly returns on several Fama-French portfolios.

Tables4

Table 1. Table 1: Estimated α 𝛼 \alpha and β 𝛽 \beta parameters from the modified three-factor Fama-French model. For each column, Est denotes the point estimator and SE the standard error. Standard errors are calculated using the LBB.

		Size-1		Size-2		Size-3		Size-4		Size-5
		Est	SE	Est	SE	Est	SE	Est	SE	Est	SE
	$α$	-0.544	0.181	-0.056	0.108	-0.054	0.075	0.229	0.079	0.285	0.080
	$β_{1}$	1.103	0.030	1.003	0.022	0.929	0.014	0.888	0.014	0.958	0.016
BM-1	$β_{2}$	1.388	0.044	1.298	0.059	1.083	0.024	1.054	0.025	1.086	0.033
	$β_{3}$	-0.220	0.052	0.079	0.044	0.310	0.027	0.460	0.025	0.691	0.028
	$β_{4}$	1.118	0.391	1.044	0.203	1.035	0.174	0.836	0.172	0.558	0.163
	$α$	-0.256	0.100	-0.026	0.085	0.127	0.070	-0.043	0.074	-0.032	0.075
	$β_{1}$	1.126	0.020	1.014	0.016	0.956	0.017	0.964	0.013	1.072	0.014
BM-2	$β_{2}$	1.012	0.035	0.886	0.042	0.769	0.056	0.722	0.036	0.884	0.023
	$β_{3}$	-0.361	0.038	0.112	0.051	0.368	0.055	0.562	0.034	0.773	0.030
	$β_{4}$	1.130	0.215	1.099	0.167	0.850	0.156	1.276	0.176	1.071	0.154
	$α$	-0.109	0.074	0.064	0.081	0.050	0.081	0.163	0.078	-0.075	0.120
	$β_{1}$	1.097	0.018	1.016	0.016	0.983	0.021	0.980	0.017	1.067	0.023
BM-3	$β_{2}$	0.752	0.024	0.553	0.058	0.442	0.065	0.433	0.054	0.573	0.071
	$β_{3}$	-0.426	0.024	0.156	0.060	0.407	0.060	0.597	0.064	0.795	0.054
	$β_{4}$	1.142	0.156	1.035	0.181	0.832	0.159	0.746	0.159	1.223	0.227
	$α$	-0.012	0.068	0.060	0.088	0.049	0.102	0.086	0.074	-0.235	0.117
	$β_{1}$	1.078	0.015	1.042	0.022	1.030	0.020	1.005	0.017	1.152	0.029
BM-4	$β_{2}$	0.403	0.032	0.212	0.061	0.188	0.062	0.223	0.031	0.293	0.061
	$β_{3}$	-0.395	0.027	0.171	0.063	0.405	0.073	0.558	0.051	0.795	0.046
	$β_{4}$	1.253	0.181	0.717	0.204	0.874	0.230	0.865	0.169	1.269	0.251
	$α$	0.124	0.065	-0.023	0.067	0.180	0.094	-0.325	0.128	-0.197	0.172
	$β_{1}$	0.985	0.013	0.983	0.016	0.934	0.021	1.033	0.025	1.121	0.031
BM-5	$β_{2}$	-0.240	0.019	-0.204	0.035	-0.246	0.042	-0.211	0.031	-0.094	0.043
	$β_{3}$	-0.365	0.025	0.074	0.046	0.298	0.045	0.649	0.047	0.839	0.040
	$β_{4}$	1.050	0.176	1.098	0.158	0.615	0.271	1.265	0.255	0.997	0.316

Table 2. Table 2: Short-term stochastic volatility parameter estimates across the 25 Fama-French portfolios. For each column, Est denotes the point estimator and SE the standard error. Standard errors are calculated using the LBB.

		Est.	SE	Est	SE	Est	SE	Est	SE	Est	SE
		Size-1		Size-2		Size-3		Size-4		Size-5
	$ϕ$	0.693	0.075	0.627	0.242	0.347	0.293	0.679	0.341	0.523	0.176
BM-1	$σ_{v}$	1.902	0.327	1.690	0.622	1.709	0.317	1.317	0.655	1.573	0.363
	$γ_{ν}$	0.259	0.154	-0.162	0.372	-0.513	0.285	0.153	0.330	0.235	0.348
	$ϕ$	0.654	0.314	0.543	0.150	0.584	0.202	0.499	0.292	0.565	0.244
BM-2	$σ_{v}$	1.602	0.484	1.742	0.310	1.694	0.480	1.560	0.408	1.656	0.507
	$γ_{ν}$	0.123	0.315	-0.262	0.242	0.007	0.503	-0.011	0.179	0.320	0.507
	$ϕ$	0.603	0.104	0.529	0.299	0.600	0.212	0.619	0.226	0.762	0.172
BM-3	$σ_{v}$	1.879	0.373	1.832	0.465	1.703	0.620	1.598	0.453	1.416	0.364
	$γ_{ν}$	0.270	0.217	-0.018	0.257	-0.127	0.626	0.165	0.360	-0.101	0.214
	$ϕ$	0.473	0.239	0.686	0.109	0.523	0.264	0.596	0.093	0.592	0.123
BM-4	$σ_{v}$	1.668	0.547	1.609	0.432	1.692	0.467	1.580	0.215	1.783	0.139
	$γ_{ν}$	-0.069	0.319	-0.245	0.218	0.242	0.467	-0.096	0.242	-0.149	0.374
	$ϕ$	0.572	0.274	0.427	0.237	0.592	0.303	0.428	0.302	0.530	0.249
BM-5	$σ_{v}$	1.507	0.618	1.879	0.396	1.668	0.584	1.623	0.619	1.747	0.414
	$γ_{ν}$	0.192	0.416	-0.095	0.503	0.428	0.429	-0.452	0.271	-0.157	0.483

Table 3. Table 3: 99% Confidence intervals for asymmetry parameter γ 𝛾 \gamma . For the entires in the table, (x,y) refers to the lower and upper level of the confidence interval, respectively, as calculated using QMLE robust standard errors. Across all 25 portfolios, only a single asymmetry parameter is statistically significant, which we mark in bold text.

	Size-1	Size-2	Size-3	Size-4	Size-5
BM-1	-0.0549, 0.1610	-0.0622, 0.1516	-0.0919, 0.1859	-0.1203, 0.0712	-0.0023, 0.1900
BM-2	-0.1915, 0.0556	-0.2716, 0.0399	-0.0687, 0.0874	-0.0324, 0.1940	-0.1255, 0.0738
BM-3	-0.2185, 0.0268	-0.0597, 0.1614	-0.0188, 0.2030	-0.0383, 0.2178	-0.1372, 0.0626
BM-4	-0.1331, 0.0865	-0.0403, 0.0981	-0.0742, 0.1751	-0.1258, 0.0755	-0.0718, 0.1619
BM-5	0.0175, 0.1154	-0.0762, 0.1446	-0.0558, 0.1325	-0.0072, 0.1698	-0.0931, 0.1709

Table 4. Table 4: ARCH test statistics for the 25 Fama-French portfolios, calculated using centered returns and five lags for the auxiliary regression. The corresponding χ 5 2 ( .01 ) subscript superscript 𝜒 2 5 .01 \chi^{2}_{5}(.01) critical value is 15.08. For each portfolio, we can reject the null at the 1% significance level. Furthermore, a similar conclusion remains at the .1% level for all but three of the 25 Fama-French portfolios. Size-j, and BM-j, j = 1 , … , 5 𝑗 1 … 5 j=1,...,5 , refer to the quintiles of size and book-to-market, respectively

	Size-1	Size-2	Size-3	Size-4	Size-5
BM-1	25.97	44.31	37.22	53.03	48.77
BM-2	24.49	40.55	57.38	60.51	45.35
BM-3	41.31	22.39	39.51	50.72	35.74
BM-4	17.54	36.94	28.51	35.89	44.28
BM-5	18.72	24.92	41.95	48.33	16.09

Equations386

Y_{t, T} = ξ (t / T) exp (h_{t} /2) ε_{t}, where h_{t} = ω + δ h_{t - 1} + σ v_{t}, (ε_{t}, v_{t})^{'} \sim N (0, [1001]),

Y_{t, T} = ξ (t / T) exp (h_{t} /2) ε_{t}, where h_{t} = ω + δ h_{t - 1} + σ v_{t}, (ε_{t}, v_{t})^{'} \sim N (0, [1001]),

Y_{t, T} = ρ (t / T) σ_{t} z_{t}, where σ_{t + 1}^{2}

Y_{t, T} = ρ (t / T) σ_{t} z_{t}, where σ_{t + 1}^{2}

P (1 \leq t \leq T max Y_{t, T} - y_{t / T, t} \leq C_{T} T^{- 1}) = 1,

P (1 \leq t \leq T max Y_{t, T} - y_{t / T, t} \leq C_{T} T^{- 1}) = 1,

Y_{t, T}

Y_{t, T}

ϵ_{t, T}

y_{u, t}

y_{u, t}

ϵ_{u, t}

∣ Y_{t, T} - y_{u, t} ∣ = O_{p} (∣ t / T - u ∣ + T^{- 1}) .

∣ Y_{t, T} - y_{u, t} ∣ = O_{p} (∣ t / T - u ∣ + T^{- 1}) .

\tilde{y}_{u, t} (θ)

\tilde{y}_{u, t} (θ)

\tilde{ϵ}_{u, t} (θ)

M_{T} [ρ; u] := \frac{1}{T h} t = 1 \sum T g [Y_{t, T}; ρ] K (\frac{u - t / T}{h}),

M_{T} [ρ; u] := \frac{1}{T h} t = 1 \sum T g [Y_{t, T}; ρ] K (\frac{u - t / T}{h}),

\overset{ρ}{^} (u; θ_{0} (u))

\overset{ρ}{^} (u; θ_{0} (u))

Y_{t, T} = f (Z_{t, T}; ρ (t / T)) + η_{t},

Y_{t, T} = f (Z_{t, T}; ρ (t / T)) + η_{t},

F := {f : ∣ f (x, ρ_{1}) - f (x, ρ_{2}) ∣ \leq b (x) ∥ ρ_{1} - ρ_{2} ∥_{\infty}, ρ_{1}, ρ_{2} \in H_{ρ}} .

F := {f : ∣ f (x, ρ_{1}) - f (x, ρ_{2}) ∣ \leq b (x) ∥ ρ_{1} - ρ_{2} ∥_{\infty}, ρ_{1}, ρ_{2} \in H_{ρ}} .

M_{T} [ρ; u]

M_{T} [ρ; u]

M_{T} [ρ; u]

M_{T} [ρ; u]

\overset{ρ}{^} (u_{i}; θ)

\overset{ρ}{^} (u_{i}; θ)

\hat{θ} (u_{i}) := ar g θ \in Θ max - ∥ \overset{ρ}{^} (u_{i}) - \overset{ρ}{^} (u_{i}; θ) ∥_{Ω}^{2} .

\hat{θ} (u_{i}) := ar g θ \in Θ max - ∥ \overset{ρ}{^} (u_{i}) - \overset{ρ}{^} (u_{i}; θ) ∥_{Ω}^{2} .

ρ_{0} (u; θ_{0} (u)) := ρ \in Γ arg min M_{0} [ρ; u], where M_{0} [ρ; u] := T \to \infty lim E M_{T} [ρ; u] .

ρ_{0} (u; θ_{0} (u)) := ρ \in Γ arg min M_{0} [ρ; u], where M_{0} [ρ; u] := T \to \infty lim E M_{T} [ρ; u] .

ρ_{0} (u; θ) := ρ \in Γ arg min \tilde{M}_{0} [ρ; u], where \tilde{M}_{0} [ρ; u] := T \to \infty lim \frac{1}{T} t = 1 \sum T E g [\tilde{y}_{u, t}; f (\tilde{z}_{u, t}; ρ)],

ρ_{0} (u; θ) := ρ \in Γ arg min \tilde{M}_{0} [ρ; u], where \tilde{M}_{0} [ρ; u] := T \to \infty lim \frac{1}{T} t = 1 \sum T E g [\tilde{y}_{u, t}; f (\tilde{z}_{u, t}; ρ)],

ϕ (k) := - T \leq t \leq T sup A \in F_{- \infty}^{T, t}, B \in F_{T, t + k}^{\infty}, P (A) > 0 sup ∣ P (B ∣ A) - P (B) ∣,

ϕ (k) := - T \leq t \leq T sup A \in F_{- \infty}^{T, t}, B \in F_{T, t + k}^{\infty}, P (A) > 0 sup ∣ P (B ∣ A) - P (B) ∣,

\exists C < \infty : T ϕ (m_{T}) / m_{T} \leq C, \forall T \in N .

\exists C < \infty : T ϕ (m_{T}) / m_{T} \leq C, \forall T \in N .

u \in U sup ρ \in E sup ∥ q [Y_{t, T}; f (Z_{t, T}, ρ)] ∥ \leq c_{q} .

u \in U sup ρ \in E sup ∥ q [Y_{t, T}; f (Z_{t, T}, ρ)] ∥ \leq c_{q} .

ρ \mapsto Ψ_{0} (ρ; u) := (\partial / \partial ρ) M_{0} [ρ; u],

ρ \mapsto Ψ_{0} (ρ; u) := (\partial / \partial ρ) M_{0} [ρ; u],

u_{1} \in U ρ_{1} \in E sup u_{2} : ∣ u_{2} - u_{1} ∣ \leq a_{1} ρ_{2} : ∥ ρ_{2} - ρ_{1} ∥ \leq a_{2} sup ∣ E g (y_{u_{1}, t}; f (z_{u_{1}, t}, ρ_{1})) - E g (y_{u_{2}, t}; f (z_{u_{2}, t}, ρ_{2})) ∣ \leq ε

u_{1} \in U ρ_{1} \in E sup u_{2} : ∣ u_{2} - u_{1} ∣ \leq a_{1} ρ_{2} : ∥ ρ_{2} - ρ_{1} ∥ \leq a_{2} sup ∣ E g (y_{u_{1}, t}; f (z_{u_{1}, t}, ρ_{1})) - E g (y_{u_{2}, t}; f (z_{u_{2}, t}, ρ_{2})) ∣ \leq ε

u \in U sup ∥ \overset{ρ}{^} (u) - ρ_{0} (u) ∥ = o_{p} (1);

u \in U sup ∥ \overset{ρ}{^} (u) - ρ_{0} (u) ∥ = o_{p} (1);

u \in U sup θ \in Θ sup ∥ \overset{ρ}{^} (u; θ) - ρ_{0} (u; θ) ∥ = o_{p} (1) .

u \in U sup θ \in Θ sup ∥ \overset{ρ}{^} (u; θ) - ρ_{0} (u; θ) ∥ = o_{p} (1) .

Y_{t, T} = f (Z_{t, T}; ρ (t / T)) + η_{t},

Y_{t, T} = f (Z_{t, T}; ρ (t / T)) + η_{t},

\overset{ρ}{^} (u)

\overset{ρ}{^} (u)

u \in U sup ∥ \overset{ρ}{^} (u) - ρ_{0} (u) ∥ = o_{p} (1), and u \in U sup θ \in Θ sup ∥ \overset{ρ}{^} (u; θ) - ρ_{0} (u; θ) ∥ = o_{p} (1) .

u \in U sup ∥ \overset{ρ}{^} (u) - ρ_{0} (u) ∥ = o_{p} (1), and u \in U sup θ \in Θ sup ∥ \overset{ρ}{^} (u; θ) - ρ_{0} (u; θ) ∥ = o_{p} (1) .

\hat{θ} (u) := ar g θ \in Θ max - ∥ \overset{ρ}{^} (u) - \overset{ρ}{^} (u; θ) ∥_{Ω}^{2},

\hat{θ} (u) := ar g θ \in Θ max - ∥ \overset{ρ}{^} (u) - \overset{ρ}{^} (u; θ) ∥_{Ω}^{2},

u \in U sup ∥ \hat{θ} (u) - θ_{0} (u) ∥ = o_{p} (1) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Time Series Analysis · Financial Risk and Volatility Modeling · Stochastic processes and financial applications

Full text

Indirect Inference for Locally Stationary Models

David T. Frazier Bonsoo Koo

Monash University

Department of Econometrics and Business Statistics, Monash University, PO Box 11E, Clayton Campus, VIC 3800, Australia; e-mail: [email protected] author. Department of Econometrics and Business Statistics, Monash University, PO Box 11E, Clayton Campus, VIC 3800, Australia; e-mail: [email protected]

Indirect Inference for Locally Stationary Models

David T. Frazier Bonsoo Koo

Monash University

Department of Econometrics and Business Statistics, Monash University, PO Box 11E, Clayton Campus, VIC 3800, Australia; e-mail: [email protected] author. Department of Econometrics and Business Statistics, Monash University, PO Box 11E, Clayton Campus, VIC 3800, Australia; e-mail: [email protected]

Abstract

We propose the use of indirect inference estimation to conduct inference in complex locally stationary models. We develop a local indirect inference algorithm and establish the asymptotic properties of the proposed estimator. Due to the nonparametric nature of locally stationary models, the resulting indirect inference estimator exhibits nonparametric rates of convergence. We validate our methodology with simulation studies in the confines of a locally stationary moving average model and a new locally stationary multiplicative stochastic volatility model. Using this indirect inference methodology and the new locally stationary volatility model, we obtain evidence of non-linear, time-varying volatility trends for monthly returns on several Fama-French portfolios.

Key words: semiparametric, locally stationary, indirect inference, state-space models

Journal of Economic Literature Classification: C13, C14, C22

1 Introduction

Time-varying economic and financial variables, and relationships thereof, are stable features in applied econometrics. Notable examples include asset pricing models with time-varying features (Ghysels, , 1998; Wang, , 2003) and trending macroeconomic models (Stock and Watson, , 1998; Phillips, , 2001). While classical analyses of time series are built on the assumption of stationarity, data studied in finance and economics often exhibit nonstationary features.

Many different schools of modeling and estimation methods are used to accommodate the nonstationary behavior of observed time series data. In particular, statistical tools developed for locally stationary processes provide a convenient means of conducting analyses of trending economic and financial models. Heuristically, local stationarity implies that a process behaves in a stationary manner (at least) in the vicinity of a given time point but could be nonstationary over the entire time horizon. For certain widely-studied time series models, slowly time-varying parameters ensure local stationarity under some regularity conditions; for instance, see Dahlhaus, (1996) and Dahlhaus, (1997) (AR(1)), Dahlhaus and Subba Rao, (2006) (ARCH( $\infty$ )), Dahlhaus and Polonik, (2009) (MA( $\infty$ )), Koo and Linton, (2012) (Diffusion processes) and Koo and Linton, (2015) (GARCH(1,1) with a time-varying unconditional variance) among many other classes of locally stationary processes.

While many classes of well-known time series models can be generalized to locally stationary processes, it is worth noting that estimation and inference procedures developed in one class of locally stationary processes often cannot be applied to a different class of locally stationary processes. In particular, many estimation methods for locally stationary processes are composed of estimation approaches that primarily focus on local regression with closed-form estimators, local maximum likelihood estimation (MLE) with a closed-form likelihood function (in the time domain) and spectral density approach (in the frequency domain), all of which could be intractable or simply difficult to implement for various locally stationary extensions of commonly used structural econometric models; we refer to Vogt, (2012), Dahlhaus and Subba Rao, (2006) and Dahlhaus and Polonik, (2009), for examples. As such, model specifications compatible with the above statistical methods are rather limited and cannot be used for estimation and inference in more complicated locally stationary models, such as, for instance, models with latent variables or unobservable factors.

More importantly, structural models of economic and financial relationships commonly rely on the use of latent variables to represent information that is unavailable to the econometrician. This modeling approach implies, almost by definition, that simple (closed-form) representations for the conditional distributions of the endogenous variables are unavailable, with simple straightforward estimation methods often infeasible as a consequence. In such cases, if we were to extend common locally-stationary models to include the latent variables that are necessary to structurally model phenomena found in economics and finance, this would render the existing estimation methods used for such models infeasible. For instance, this situation arises in state-space models if either the measurement or state transition densities do not have closed forms, as in the case of stochastic volatility models. A secondary example is the fact that estimation of univariate locally stationary diffusion models cannot be straightforwardly extended to versions of these models with stochastic volatility.

To circumvent the above issue, and to help proliferate the use of locally stationary models and methods in econometrics and finance, we propose a novel nonparametric indirect inference (hereafter, II) method to estimate locally stationary processes. Instead of estimating complex structural locally stationary models directly, we indirectly obtain our estimator by targeting consistent estimators of simpler auxiliary models, and use these consistent estimates to conduct inference on the structural parameters. See, Smith, (1993), Gourieroux et al., (1993) and Gourieroux and Monfort, (1996) for discussion of indirect inference in parametric models.

To illustrate the main idea behind our nonparametric II approach for locally stationary processes, we consider the following motivating example. Suppose that the true data generating process evolves according to

[TABLE]

where $\xi(t/T)>0,$ for all $t\leq T$ . This locally stationary multiplicative stochastic volatility (LS-SV) model decomposes volatility into a short-term, latent volatility process, $h_{t}$ , and a slowly time-varying component, captured by $\xi(\cdot)$ , and can capture a wide range of volatility behaviors. The above model allows for non-stationary, but slowly changing, volatility dynamics, which may result from the transitory nature of the business cycle.

Suppose that we wish to estimate and conduct inference on the unknown volatility function $\xi(\cdot)$ in (1). While (G)ARCH-based versions of the locally stationary volatility model have been analyzed by several researchers (see, e.g., Dahlhaus and Subba Rao, , 2006, Engle and Rangel, , 2008, Fryzlewicz et al., , 2008, and Koo and Linton, , 2015), since the latent volatility process, $h_{t}$ , pollutes the observed data, $Y_{t,T}$ , it is not entirely clear how to estimate parameters in (1). Indeed, largely due to this fact, locally stationary volatility models have not been previously explored in the literature, even though their stationary counterparts form the backbone of many empirical studies in finance and financial econometrics.

In this paper, we generalize the II approach of Gourieroux et al., (1993) to present a convenient estimator for unknown functions in locally stationary models, such as the LS-SV model. This approach to II estimation relies on a locally stationary auxiliary model that can be easily estimated using the observed data and that captures the underlying features of interest in the structural model. For example, in the context of the LS-SV model, a reasonable auxiliary model would be the locally stationary GARCH model:

[TABLE]

where $\rho(t/T)>0$ for all $t\leq T$ , and where $z_{t}$ is an error process.

The remainder of this paper further develops the ideas behind this estimation method in the context of a general locally stationary model and establishes the asymptotic properties of the proposed estimation procedure under regularity conditions. To establish the asymptotic properties of these II estimators, we must first develop conditions that guarantee locally stationary models admit consistent estimators of their corresponding limit values. This is itself a novel result since the vast majority of research into locally stationary models has focused on estimators defined by relatively simple criterion functions, and all under the auspices of correct model specification. Indeed, Kristensen and Lee, (2019) is the only other study of which the authors are aware that treats genuinely misspecified locally stationary models. These new results for locally stationary estimators of the auxiliary model enable us to deduce the asymptotic properties of our proposed II estimator for the structural model parameters.

The estimation procedure proposed herein is demonstrated through two Monte Carlo examples, and an empirical application. The empirical application applies the LS-SV model to examine the volatility structure of several commonly analyzed Fama-French portfolios. We find that most of these portfolios display time-varying volatility patterns that broadly track the underlying (low-frequency) expansion and contractions of the United States economy.

The remainder of the paper is organized as follows. Section 2 introduces the general model and the related framework. In Section 2.3, we present our general approach and define the corresponding local II (L-II) estimators for a general locally stationary model. Section 3 develops asymptotic results that demonstrate the properties of this estimation procedure. Simulation results for a simple example of a locally stationary moving average model of order one are discussed in Section 4. In Section 5 we analyze the locally stationary stochastic volatility model. We consider a small Monte Carlo to demonstrate our estimation method, then apply this method to analyze the volatility behavior of Fama-French portfolio returns, where we find ample evidence for smoothly time-varying nonlinear volatility dynamics over the sample period. All proofs are relegated to Appendix A. The tables and figures associated with the application in Section 5 are given in Appendix B. The proof of Corollary 2 and additional details for the LS-SV model are provided in the supplementary appendix.

Throughout this paper, the following notations are used. The symbol $\mathbb{R}$ denotes the real numbers, while $\mathbb{N}$ denotes the natural numbers. For $x\in\mathbb{R}^{d}$ , we let $\|x\|$ denote the Euclidean norm, while $|\cdot|$ denotes the absolute value function, and for $\Omega$ a $d\times d$ positive-definite matrix, we let $\|x\|^{2}_{\Omega}:=x^{\prime}\Omega x$ denote the weighted norm of $x$ . For $g:\mathbb{R}^{d}\rightarrow\mathbb{R}$ denoting a given function, we let $\|g\|_{\infty}:=\sup_{x\in\mathbb{R}^{d}}|g(x)|$ denote the sup-norm. For an unknown parameter $\theta$ , the subscript [math] denotes the true value of $\theta$ . The quantities $O_{p}(\cdot)$ and $o_{p}(\cdot)$ denote the usual big $O$ and little $o$ in probability. $C$ denotes a generic constant that can take different values in different places.

2 The model

2.1 Structural models

We assume the researcher is interested in conducting inference on a model in the class of locally stationary processes.

Definition 1.

Let $\{Y_{t,T}\}_{t=1,...,T;T=1,2,...}$ denote a triangular array of observations. The process $\{Y_{t,T}\}$ is locally stationary if there exists a stationary process $\{y_{t/T,t}\}$ for each re-scaled time point $t/T\in[0,1]$ , such that for all $T$ ,

[TABLE]

where $\{C_{T}\}$ is a measurable process satisfying, for some $\eta>0$ , $\sup_{T}{E}\left(\left|C_{T}\right|^{\eta}\right)<\infty$ .

The magnitude of $\eta$ captures the degree of approximation of $y_{t/T,t}$ to $Y_{t,T}$ , which reflects the characteristics of the underlying processes of interest. The larger $\eta$ , the better the approximation. We do not specify the magnitude of $\eta$ to maintain generality, which allows us to represent various types of processes, and instead allow $\eta$ to vary from model to model. See, for instance, Dahlhaus and Subba Rao, (2006) for ARCH( $\infty$ ), Koo and Linton, (2012) for diffusion processes, Vogt, (2012) for AR processes and Dahlhaus and Polonik, (2009) for MA processes among many other processes.

We consider that the process $\{Y_{t,T}\}$ is generated from the following locally stationary structural model:

[TABLE]

where both $r(\cdot)$ and $\varphi(\cdot)$ are real-valued functions that are known up to the unknown function $\theta_{0}$ . The function of interest is $\theta_{0}\in\mathcal{H}_{\theta}$ , where $(\mathcal{H}_{\theta},\|\cdot\|)$ denotes a normed vector space of function. The structural model, and $\theta_{0}$ satisfy the following regularity conditions.

Assumption 1.

(i) For a positive $\delta=o(1)$ , and $u=t/T\in[\delta,1-\delta]$ , the function $\theta_{0}(u)$ has uniformly bounded second-derivatives with respect to $u$ . (ii) The functions $r(\cdot)$ , $\varphi(\cdot)$ , known up to $\theta_{0}(\cdot)$ , are twice continuously differentiable with respect to $\theta$ , with uniformly bounded second derivatives. (iii) The error term $\{\nu_{t}\}_{t\geq 1}$ is a white noise process with known distribution.

The structural model in (3) is quite general and can accommodate many interesting processes, including models with complex time-varying features, such as time-varying autoregressive conditional heteroskedasticity (ARCH). In addition, the structural model in (3) can always be augmented with additional exogenous regressors at the cost of additional notation. Such regressors may be used, for instance, to capture some conditionally heteroskedastic features of the data. Critically for our purposes, under Assumption 1, if $\theta_{0}(\cdot)$ were known, simulated realizations of $\{Y_{t,T}\}$ could easily be generated from the model in equation (3).111We note here that Assumption 1(iii) is standard in the II literature. Indeed, Gourieroux et al., (1993) argue that this is not a real assumption since the error term “can always be considered as a function of a white noise with a known distribution and of a parameter which can be incorporated” into the unknown parameters. If the process in (3) is locally stationary, inference on $\theta_{0}(\cdot)$ can be carried out through an approximate structural model defining a stationary process indexed by $u\in\mathcal{U}$ , where $\mathcal{U}$ denotes the domain of re-scaled time point $u=t/T$ , i.e. $\mathcal{U}=[\delta,1-\delta]$ with a positive $\delta=o(1)$ :

[TABLE]

Lemma 1.

Suppose that $\{Y_{t,T}\}$ in (3) is locally stationary as in Definition 1. Under Assumption 1, as $T\rightarrow\infty$ , the process $\{y_{u,t}\}$ in (4) is such that

[TABLE]

Lemma 1 is consistent with Proposition 3.1 of Dahlhaus et al., (2019), and implies that in the neighborhood of a re-scaled time point $u=t/T$ , the local behavior of $\{Y_{t,T}\}$ can be approximated by the behavior of $\{{y}_{u,t}\}$ . Consequently, statistical analysis on $\{Y_{t,T}\}$ can be based on a collection of locally stationary processes $\{y_{u,t}:u\in\mathcal{U}\}$ .

Under local stationarity, we will demonstrate that estimation of the unknown (vector) function $\theta_{0}(\cdot)$ in (3) can proceed through a local version of II (L-II) conducted at the time points $u=t/T$ . This approach relies on the fact that, for any $u\in\mathcal{U}$ , $\theta_{0}(u)$ in (4) satisfies $\theta_{0}(u)\in\Theta\subset\mathbb{R}^{d_{\theta}}$ ; i.e., in the locally stationary structural model we view the function of interest as a map $\theta_{0}(\cdot):\mathcal{U}\mapsto\Theta$ . The assumption that $\theta_{0}(\cdot)$ is our only parameter of interest is without loss of generality as we may always redefine $\theta_{0}(\cdot)$ to include those elements (time-varying or otherwise) of the distribution for the errors that are unknown. This paper is particularly concerned with estimation and inference when the structural model, (3), rules out direct estimation approaches developed in the existing literature, for instance, due to the presence of latent variables that make computation of the likelihood function intractable.

Consider that our goal is to estimate the unknown map $\theta_{0}:\mathcal{U}\mapsto\Theta$ at a given point $u\in\mathcal{U}$ . Since $\theta_{0}(u)\in\Theta\subset\mathbb{R}^{d_{\theta}}$ , we associate to this unknown function (evaluated at the point $u$ ) a vector $\theta\in\Theta$ . Even if the vector $\theta$ can not be estimated by direct means, since $\{y_{u,t}\}$ is stationary (at the fixed value $u$ ) we can easily simulate a realization of this series by replacing $\theta_{0}(u)$ in equation (3) by $\theta$ . For fixed $u\in\mathcal{U}$ and some $\theta\in\Theta$ , a simulated series $\{\tilde{y}_{u,t}(\theta)\}_{t\leq T}$ can be generated according to

[TABLE]

where $\tilde{\nu}_{t}$ denotes a simulated realization of the random variable $\nu_{t}$ .222The use of slightly misspecified simulators in II is not uncommon, see, e.g., Dridi et al., (2007), Altonji et al., (2013), Bruins et al., (2018), and Frazier et al., (2019) for examples of misspecified simulators in the context of II estimation. In this sense, we follow the above papers in that the version of the structural model used to simulate data is a (locally) misspecified version of the true DGP. Throughout the remainder, a tilde, $\tilde{}$ , over a variable will denote that this variable is simulated and when no confusion will result we drop simulated series dependence on $\theta$ , e.g., we take $\tilde{y}_{u,t}$ to mean $\tilde{y}_{u,t}(\theta)$ .

Given the simulated series $\{\tilde{y}_{u,t}\}_{t\leq T}$ , II estimation of $\theta_{0}(u)$ can then proceed by minimizing the difference between statistics calculated from the observed data, $\{Y_{t,T}\}_{t\leq T}$ , and the simulated data, $\{\tilde{y}_{u,t}\}_{t\leq T}$ . Repeating this procedure at a collection of points $u_{1},\dots,u_{m}$ would then yield an estimate of the unknown function $\theta_{0}(\cdot)$ .

2.2 Auxiliary models and direct estimation

To employ our L-II estimation method, we specify an auxiliary model defined by the unknown (vector) function $\rho(\cdot)\in\mathcal{H}_{\rho}$ , with $(\mathcal{H}_{\rho},\|\cdot\|)$ a vector space of functions, and where $\rho(\cdot):\mathcal{U}\mapsto\Gamma\subset\mathbb{R}^{d_{\rho}}$ with $d_{\rho}\geq d_{\theta}$ . Similar to the structural function of interest, for any given $u\in\mathcal{U}$ we associate to the unknown function $\rho(u)$ a vector $\rho\in\Gamma\subset\mathbb{R}^{d_{\rho}}$ . In general, we will only emphasize the parameters’ dependence on the point $u$ when necessary.

Reflecting the features of the true structural model, the auxiliary model is chosen such that it allows for direct estimation of $\rho(\cdot)$ . We estimate $\rho(\cdot)$ at the point $u$ , i.e., $\rho=\rho(u)$ , by minimizing a local criterion function: for kernel function $K(\cdot)$ and bandwidth parameter $h$ , define

[TABLE]

where $g(\cdot)$ is a known function whose properties we later specify. Note that, technically $M_{T}[\rho;u]$ depends on the array $\{Y_{t,T}\}_{t\leq T}$ , however, we obviate this dependence to keep notation as simple as possible. Given $M_{T}[\rho;u]$ , an estimator for $\rho(u)$ can be defined as

[TABLE]

The explicit dependence of $\hat{\rho}(u;\theta_{0}(u))$ on $\theta_{0}(u)$ clarifies that the auxiliary estimator depends on the unknown $\theta_{0}(\cdot)$ at the point $u$ . However, throughout the remainder, to simplify notation, we obviate this explicit dependence and simply define $\hat{\rho}(u):=\hat{\rho}(u;\theta_{0}(u))$ .

It is natural to consider an auxiliary model which allows for simple estimation of the auxiliary parameters. One such useful class of auxiliary models will be nonlinear regression models of the type considered in Robinson, (1991) and Zhang et al., (2015): for $Z_{t,T}$ a triangular array of variables that are measurable at time $t$ , and exogenous with respect to the error term $\eta_{t}$ , the auxiliary model is given as

[TABLE]

where $f(\cdot)\in\bm{F}$ is known, up to the unknown $\rho(\cdot)$ , and where

[TABLE]

The set $\bm{F}$ restricts the form of $f(\cdot)$ to be locally (in $x$ ) Lipschitz (in $\rho$ ), with this restriction being satisfied by many regression functions. Under this specific nonlinear regression model, $M_{T}[\cdot;u]$ could be the local least squares criterion

[TABLE]

While nonlinear regression models are a useful class of auxiliary models, we do not wish to restrict our analysis solely to this class, and we therefore allow the criterion function $M_{T}[\rho;u]$ to be general. However, to ensure our theory can easily accommodate this case, we further specialize the structure of the auxiliary criterion function $M_{T}$ : For some kernel function, $K(\cdot)$ and a bandwidth parameter, $h$ , some known function $f(\cdot)\in\mathbf{F}$ and observable exogenous variables $Z_{t,T}$ , we assume that

[TABLE]

2.3 Estimation of structural parameters

For $\{Y_{t,T}\}_{t\leq T}$ denoting a set of observations from the locally stationary structural model (3), satisfying Definition 1, the auxiliary estimator $\hat{\rho}(u)$ in (8) approximates the behavior of $\rho(\cdot)$ at the point $u$ . Given $\hat{\rho}(u)$ , an estimator of $\theta_{0}(u)$ can then be obtained by matching $\hat{\rho}(u)$ against a version that is calculated based on data simulated from the model under a given $\theta\in\Theta$ , and a given $u\in\mathcal{U}$ . However, we note that it is unclear in general how to simulate from the non-stationary structural model defined by (3).

Therefore, instead of attempting to simulate from the model (3), we invoke the local stationarity of $\{Y_{t,T}\}$ and generate (simulated) realization from the stationary process $\{y_{u,t}:u\in\mathcal{U}\}$ , defined by (4), which approximates $\{Y_{t,T}\}$ in the sense of Definition 1. Such an II estimation approach is by construction “local” in that all we can recover is $\theta_{0}(u)$ . An estimate of $\theta_{0}(\cdot)$ can be obtained by repeatedly applying this local II (L-II) approach at a given set of time points $\{u_{i}\}_{i=1}^{m}$ , where $\max_{i}\Delta u_{i}=O(T^{-1})$ and $\Delta u_{i}:=u_{i}-u_{i-1}$ .

More specifically, for some fixed $u_{i}\in\mathcal{U}$ and a corresponding candidate for $\theta_{0}(u_{i})$ , say, ${\theta}={\theta}(u_{i})\in\Theta$ , L-II then simulates data $\{\tilde{y}_{u_{i},t}\}_{t\leq T}$ from (4) using simulated errors $\{\tilde{\nu}_{t}\}_{t\leq T}$ . Given $\{\tilde{y}_{u_{i},t}\}_{t\leq T}$ , we estimate the auxiliary parameters using

[TABLE]

which corresponds to a simulated version of the local criterion function $M_{T}[\rho;u]$ in the vicinity of time point $u_{i}$ . Note that, similar to the notation we employ for $\hat{\rho}(u_{i})$ , the notation $\hat{\rho}(u_{i};\theta)$ is an abbreviation for $\hat{\rho}(u_{i};\theta(u_{i}))$ .

Using $\hat{\rho}(u_{i})$ and $\hat{\rho}(u_{i};{\theta})$ , the L-II estimator of $\theta_{0}(u_{i})$ can then be calculated, for positive-definite weighting matrix $\Omega$ , as

[TABLE]

Using the same simulated errors $\{\tilde{\nu}_{t}\}_{t=1}^{T}$ , we may repeat the above procedure for $\{u_{i}\}_{i=1}^{m}$ , with $0<u_{1}<u_{2}<\cdots<u_{m}<1$ , and $\max_{i}\Delta u_{i}=O(T^{-1})$ , to obtain an estimator of $\theta_{0}(\cdot)$ .

The key feature of the above L-II procedure is that, due to the locally-stationary nature of (4), the simulated series $\{\tilde{y}_{u_{i},t}\}_{t\leq T}$ is stationary for each $u_{i}$ , $i=1,...,m$ . In this way, at each time point $u_{i}$ , L-II matches a nonparametric estimator against a parametric estimator. As the following section illustrates, a consequence of this estimation approach is that the estimator $\hat{\theta}(\cdot)$ will inherit the asymptotic properties of the nonparametric estimator $\hat{\rho}(\cdot)$ .

3 Asymptotic behavior of L-II

This section establishes the asymptotic properties of the L-II estimator. We establish the convergence (in probability) of $\hat{\theta}(\cdot)$ to $\theta_{0}(\cdot)$ and provide the asymptotic distribution of $\hat{\theta}(\cdot)$ under a fairly general setup.

Before presenting the details, we introduce the limit quantities that will be needed for our results. Consider the limit objective function and its minimizer corresponding to sample quantities, i.e. (7) and (8), such that, for $u\in\mathcal{U}=[\delta,1-\delta]$ and a small, positive $\delta=o(1)$ ,

[TABLE]

When no confusion will result, we denote $\rho_{0}(u;\theta_{0}(u))$ by $\rho_{0}(u)$ . The value $\rho_{0}(u)$ is the minimizer of the limit map $\rho\mapsto\mathbb{M}_{0}[\rho;u]$ and depends on the features of the true distribution and the true value of the unknown function, $\theta_{0}(\cdot)$ , in the structural model.

Likewise, we require that the simulated auxiliary estimator has a well-defined probability limit. Recalling the stationary nature of the simulated data, $\tilde{y}_{u,t}$ , such a requirement boils down to standard results for the consistency of quasi-maximum likelihood estimators for the pseudo-true value; see, e.g., White, (1982) and White, (1996). The simulated counterpart to the pseudo-true parameter $\rho_{0}(u)$ is the map $\theta\mapsto\rho_{0}(u;\theta)$ , which we define as

[TABLE]

and where we remind the reader that we have suppressed the dependence of the simulated series $\tilde{y}_{u,t}$ on $\theta$ for notational simplicity.

3.1 Consistency

To demonstrate the asymptotic properties of our proposed L-II approach, we employ the following regularity conditions.

Assumption 2.

(i) $\left\{(Y_{t,T},Z_{t,T});t=1,...,T;T=1,2,...\right\}$ are triangular arrays of locally stationary processes satisfying Definition 1 and are $\phi$ -mixing with its mixing coefficients $\phi(k)$ such that for all integers $0<t<\infty$ and $k>0$ ,

[TABLE]

where $1\geq\phi(0)\geq\phi(1)\geq...$ and $\mathcal{F}_{-\infty}^{T,t}$ and $\mathcal{F}_{T,t+k}^{\infty}$ are $\sigma$ -fields generated by $\{(Y_{i,T},Z_{i,T});i\leq t\}$ and $\{(Y_{i,T},Z_{i,T});i\geq t+k\}$ respectively. The mixing coefficients $\phi(k)$ converge to zero as $k\rightarrow\infty$ , and are such that, for some sequence $m_{T}$ , with $1\leq m_{T}\leq T$ ,

[TABLE]

(ii) For all $u\in\mathcal{U}$ and $\theta\in\Theta$ , the approximate structural process, $\{y_{u,t}\}$ , defined by (4), satisfies (a) $E|y_{u,t}(\theta)|^{2}<\infty$ , (b) $Ey_{u,t}(\theta)=C$ , and (c) $Cov(y_{u,t}(\theta),y_{u,s}(\theta))=Cov(y_{u,t+m}(\theta),y_{u,s+m}(\theta))$ for all integers $t,s,m$ .

For an arbitrary point $u\in\mathcal{U}$ , define a local neighborhood of $\rho_{0}(u)$ as $\mathcal{E}:=\{\rho\in\Gamma:\|\rho-\rho_{0}(u)\|\leq\varepsilon\}$ and $\mathcal{E}^{c}:=\{\rho\in\Gamma:\|\rho-\rho_{0}(u)\|>\varepsilon\}$ .

Assumption 3.

*(i) For $f\in\mathbf{F}$ , $g[Y_{t,T};f(Z_{t,T},\rho)]$ is twice continuously and boundedly differentiable in all arguments. For all $u\in\mathcal{U}$ , $\sup_{\rho\in\mathcal{E}}E|g(Y_{t,T};f(Z_{t,T},\rho))|<\infty$ .

(ii) For all $u\in\mathcal{U}$ and any $\rho\in\mathcal{E}$ , $\rho\mapsto f(Z_{t,T};\rho)$ is measurable and twice differentiable at $\rho$ and satisfies $\sup_{\rho\in\mathcal{E}}E|f(Z_{t,T};\rho)|<\infty$ . In addition, there exists a function $\bar{f}(\cdot)$ such that $\sup_{\rho\in\mathcal{E}}f(z;\rho)\leq\bar{f}(z)$ with $E|\bar{f}(Z_{t,T})|_{\infty}<\infty$ .

(iii) Let $q[Y_{t,T};f(Z_{t,T},\rho)]=(\partial/\partial\rho)g[Y_{t,T};f(Z_{t,T},\rho)]$ . The function $q(\cdot)$ is differentiable in $\rho$ , for all $\rho\in\mathcal{E}$ , and is strict monotonic, in $\rho$ , in a neighborhood of $\rho_{0}$ . Moreover, there exists a constant $c_{q}$ such that, up to an $O_{p}(T^{-1})$ term,*

[TABLE]

(iv) For any given $u\in\mathcal{U}$ , the map

[TABLE]

*exists and $\rho_{0}(u)$ is the unique zero of $\Psi_{0}(\rho;u)$ ; i.e. for an arbitrarily small $\varepsilon>0$ , there exists $\eta>0$ such that $\inf_{\rho\in\mathcal{E}^{c}}\mathbb{M}_{0}\left[\rho;u\right]-\mathbb{M}_{0}\left[\rho_{0};u\right]\geq\eta.$

(v) $\forall\varepsilon>0,\exists a_{1},a_{2}>0$ *

[TABLE]

*is satisfied.

(vi) For any given $u\in\mathcal{U}$ , $\theta\in\Theta$ , $\theta\mapsto\rho_{0}(u;\theta)$ is continuous and injective for all $\theta\in\Theta$ .

(vii) The parameter spaces $\Gamma\subset\mathbb{R}^{d_{\rho}}$ and $\Theta\subset\mathbb{R}^{d_{\theta}}$ , with $d_{\rho}\geq d_{\theta}$ , are compact.*

Assumption 4.

(i)The kernel function $x\mapsto K(\cdot)$ is positive, symmetric around zero, and bounded. In addition: (i.a) $K(\cdot)$ is $r$ -times continuously differentiable for $x\in\mathbb{R}$ , with $r\geq 2$ ; (i.b) $K(\cdot)$ satisfies $\displaystyle\int K(x)dx=1$ , $\kappa_{2}=\int K^{2}(x)dx<\infty$ , $\int|K(x)|dx<\infty$ and either $\sup_{x}K(x)<\infty$ , $K(x)=0$ for $|x|>L$ with $L<\infty$ or $|\partial K(x)/\partial x|\leq C$ and for some $v>1$ , $|\partial K(x)/\partial x|\leq C|x|^{-v}$ for $|x|>L$ ; (i.c) $\displaystyle\mu_{i}(K)=\int x^{i}K(x)dx=0$ , $i=1,\ldots,r-1$ , and: $\displaystyle\int x^{r}K(x)dx\neq 0$ , $\displaystyle\int|x|^{r}|K(x)|dx<\infty$ , $\displaystyle\lim_{|x|\rightarrow\infty}|x|K\left(x\right)=0$ ; (i.d) $K(\cdot)$ is Lipschitz continuous, i.e. $|K(x)-K(x^{\prime})|\leq C|x-x^{\prime}|$ for all $x,x^{\prime}\in\mathbb{R}$ .

(ii) The bandwidth $h$ is such that, as $T\rightarrow\infty,$ $h\rightarrow 0$ , $Th\rightarrow\infty$ and $Th\big{/}(m_{T}\log T)\rightarrow\infty$ .

Remark 1.

Assumption 2.(i) states that we restrict our attention to locally stationary processes and allows us to utilize the asymptotic independence property for heterogeneous data. The decay rate of the $\phi$ -mixing coefficient is quite weak. For instance, any exponential decay rate satisfies the condition. In addition, the $\phi$ -mixing can be relaxed to strong-mixing if we restrict the form of $g(\cdot)$ . For instance, for the regression objective function - whether it is linear or nonlinear - strong-mixing assumption suffices. Assumption 2.(ii) ensures that for all $\theta\in\Theta$ , there exists an approximating stationary process. It is worth noting that this is a condition for the parameter space $\Theta$ , and implicitly confines the size of $\Theta$ , so that we exclude the possibility of generating non-stationary simulated series. Assumption 3 is concerned with the auxiliary model and its objective function. Assumptions 3.(i) and 3.(ii) ensure uniform continuity of the objective function in a neighborhood of the pseudo-true value, $\rho_{0}(u)$ . They also ensure the existence of a well-behaved limit of the objective function due to the dominated convergence theorem. Assumption 3.(iii) is concerned with the behavior of the first-order condition and the monotonicity warranted by the minimizer of the criterion is needed to ensure that the optimizer of the auxiliary criterion is unique. In general, one can replace this condition with the high-level condition that the auxiliary estimator “nearly minimizes” the criterion function, however, we believe this primitive condition is more informative than invoking this alternative high-level condition. Assumption 3.(iv) is an asymptotic identification condition such that the unique minimizer of $\mathbb{M}_{0}$ is well separated and therefore unique. Assumption 3.(v) states uniform equicontinuity for the uniform LLN. Assumption 3.(vi) is an identification condition and is akin to a local version of the standard II identification condition. Assumption 3.(vii) requires that the parameter spaces for $\rho(u)$ and $\theta(u)$ are compact. Finally, Assumption 4 describes features of the kernel function and the bandwidth, which is standard in nonparametric kernel estimation. $\Box$

Uniform (in $u$ ) consistency of the L-II estimator $\hat{\theta}(u)$ requires the uniform convergence of the auxiliary estimators $\hat{\rho}(u)$ and $\hat{\rho}(u;\theta)$ to their limit counterparts.

Theorem 1.

Under Assumptions 1-4, $\hat{\rho}(u)$ , and $\hat{\rho}(u;\theta)$ exist and are unique w.p.1. In addition, the following are satisfied.

The auxiliary estimator $\hat{\rho}(u)$ , calculated using the observed sample $\{Y_{t,T}\}_{t\leq T}$ , satisfies

[TABLE] 2. 2.

The auxiliary estimator $\hat{\rho}(u;\theta)$ , calculated using the simulated sample $\{\tilde{y}_{u,t}\}_{t\leq T}$ , satisfies

[TABLE]

Remark 2.

We note here that Theorem 1 is of independent interest. The result in equation (12) is one of the first results, to our knowledge, on the uniform consistency of estimators in general locally stationary models. The only other results in this direction that the authors are aware of are those for (quasi) maximum likelihood estimators in Kristensen and Lee, (2019) and Dahlhaus et al., (2019). $\Box$

Remark 3.

As stated earlier, a useful class of auxiliary models for L-II is the class of nonlinear regression models. Suppose that the auxiliary model is given by

[TABLE]

where $\eta_{t}$ is strictly stationary and $\phi$ -mixing with $E|\eta_{t}|<\infty$ and independent of the explanatory variables $Z_{t,T}$ . The estimator of the auxiliary parameter is given as

[TABLE]

For this specific choice of auxiliary model and criterion function, we have the following immediate corollary to Theorem 1.

Corollary 1.

Under Assumptions 1-4, for $\hat{\rho}(u)$ defined as in (14) and $\hat{\rho}(u;\theta)$ its simulated counterpart, we have

[TABLE]

$\Box$

The (uniform) consistency of $\hat{\rho}(u)$ and $\hat{\rho}(u;\theta)$ allows us to deduce the uniform consistency of the L-II estimator.

Theorem 2.

Let Assumptions 1-4 be satisfied. For $\Omega$ a symmetric, positive-definite weighting matrix, the estimator

[TABLE]

satisfies

[TABLE]

Remark 4.

Theorem 2 requires, among other things, a condition guaranteeing identification of $\theta_{0}(u)$ for any $u\in\mathcal{U}$ . This requires that, for any $u\in\mathcal{U}$ and for some $\theta\in\Theta$ , $\rho_{0}(u;{\theta})$ is able to match $\rho_{0}(u)$ , and that this matching be unique. Recalling that $\rho_{0}(u)=\rho_{0}(u;\theta_{0}({u}))$ , this identification requires that $\theta_{0}(u)$ be the unique solution, in $\theta$ , to

[TABLE]

for $u\in\mathcal{U}$ . For $\rho_{0}(\cdot;\theta)$ continuous and strictly monotonic, in $\theta$ , for any $u$ , in the case of $d_{\rho}=d_{\theta}=1$ , this defines $\theta(\cdot)$ as

[TABLE]

Therefore, under continuity and monotonicity of $\rho(\cdot;\theta)$ , in $\theta$ , for any $u\in{\mathcal{U}}$ , $\theta_{0}(\cdot)$ is identified. Such a condition is equivalent to the injectivity conditions required by Theorem 2, which is a necessary condition required of parametric II (Gourieroux et al., 1993). Therefore, we see that if $\theta(t/T)=\theta$ for all $t/T$ , i.e., the unknown function is constant, this identification condition is equivalent to the identification condition generally employed in parametric II estimation and Theorem 2 reduces to the standard consistency result for II estimation. $\Box$

3.2 Asymptotic distribution

In what follows, let $\Psi_{T}(\rho;u):=\sum_{t=1}^{T}q[Y_{t,T};f(Z_{t,T},\rho)]K\left(\frac{u-t/T}{h}\right)/Th$ and recall the definitions $\Psi_{0}(\rho;u):=(\partial/\partial\rho)\mathbb{M}_{0}[\rho;u]$ and $\mathcal{E}:=\{\rho\in\Gamma:\|\rho-\rho_{0}(u)\|<\varepsilon\}$ . We deduce the asymptotic distribution of the L-II estimator under the following high-level regularity conditions.

Assumption 5.

For fixed $u\in\mathcal{U}$ , the following are satisfied.

There exists a matrix $V(u)$ , satisfying $0<\inf_{u\in\mathcal{U}}\|V(u)\|\leq\sup_{u\in\mathcal{U}}\|V(u)\|<\infty$ , such that

[TABLE] 2. 2.

For $V(u)$ as in the above assumption, and for $\tilde{y}^{0}_{u,t}=\tilde{y}_{u,t}(\theta_{0}(u))$ denoting a realization simulated under $\theta_{0}(u)$ , $\frac{1}{\sqrt{T}}\sum_{t=1}^{T}\left\{q(\tilde{y}^{0}_{u,t},\rho_{0}(u))\right\}\rightarrow_{d}\mathcal{N}\left(0,V(u)\right).$ 3. 3.

For some $\varepsilon>0$ , $\sup_{u\in\mathcal{U}}\sup_{\rho\in\mathcal{E}}\left\|\frac{\partial\Psi_{T}(\rho;u)}{\partial\rho^{\prime}}-\frac{\partial\Psi_{0}(\rho;u)}{\partial\rho^{\prime}}\right\|=o_{p}(1)$ . 4. 4.

$\Psi_{0}(\rho;u)$ * and $\partial\Psi_{0}(\rho;u)/\partial\rho^{\prime}$ are Lipschitz continuous in both $u$ and $\rho$ .* 5. 5.

$\partial\Psi_{0}(\rho_{0}(u);u)/\partial\rho^{\prime}$ * is invertible for all $u\in\mathcal{U}$ .* 6. 6.

$\sup_{u\in\mathcal{U}}\|\partial\rho_{0}(u;\theta_{0}(u))/\partial u\|<\infty$ , $\sup_{u\in\mathcal{U}}\|\partial^{2}\rho_{0}(u;\theta_{0}(u))/\partial u^{2}\|<\infty$ , and $\partial\rho_{0}(u;\theta_{0}(u))/\partial\theta^{\prime}$ is full column rank for all $u\in\mathcal{U}$ .

Remark 5.

Assumption 5 amounts to a local version of the uniform convergence and asymptotic normality conditions required to demonstrate asymptotic normality of parametric II estimators. We note that it is feasible to consider more primitive assumptions that can guarantee the conditions in Assumption 5 (see, e.g., Kristensen and Lee, (2019) and Dahlhaus et al., (2019) for discussion). However, such an approach would require considerable technical effort and is not necessarily germane to the main message of this paper. Therefore, we leave the study of more primitive approaches to obtaining the required regularity in Assumption 5 for future research. $\Box$

Theorem 3.

If Assumptions 1-5 are satisfied, and if $Th^{3}=o(1)$ , then as $T\rightarrow\infty$

[TABLE]

where $Q(u)=\left\{\frac{\partial\rho_{0}(u;\theta)^{\prime}}{\partial\theta}\Omega\frac{\partial\rho_{0}(u;\theta)}{\partial\theta^{\prime}}\right\}\bigg{|}_{\theta=\theta_{0}(u)}$ and $W(u)=\frac{\partial\rho_{0}(u;\theta)^{\prime}}{\partial\theta}\Omega\left(\frac{\partial\Psi_{0}(\rho;u)}{\partial\rho^{\prime}}\right)^{-1}\bigg{|}_{\theta=\theta_{0}(u),\;\rho=\rho_{0}(u)}$ .

Remark 6.

In the L-II context, the bandwidth, $h$ , affects the structural estimates through the estimated auxiliary parameter $\hat{\rho}(\cdot)$ . Therefore, the bandwidth must be chosen with respect to the estimated auxiliary parameters. Theorem 3 demonstrates that so long as the bandwidth satisfies $Th^{3}=o(1)$ , the L-II estimator $\hat{\theta}(\cdot)$ will be asymptotically normal, and will not exhibit any asymptotic bias. Indeed, Dahlhaus et al., (2019) argue that, for many different classes of locally stationary models estimated by local maximum likelihood, such a choice of bandwidth is optimal in terms of mean squared error. However, if one considers a slower rate for $h$ , the resulting L-II estimator will be contaminated by an asymptotic bias. In such cases, the results given in Kristensen and Lee, (2019), in particular their Corollary 1, can be used to deduce the general form of the bias. $\Box$

Remark 7.

For $H\geq 1$ denoting the number of model simulations, the reader may notice that the $(1+1/H)$ term that generally appears in the asymptotic distribution of II estimators is absent in Theorem 3. The absence of this term is a consequence of matching a parametric estimator, $\hat{\rho}(u;\theta)$ , against a nonparametric estimator, $\hat{\rho}(u)$ . Recall that the estimator $\hat{\rho}(u;\theta)$ is based on $H$ simulated paths of length $T$ , i.e., $TH$ total observations. Consequently, under regularity conditions, $\|\hat{\rho}(u;\theta_{0}(u))-\rho_{0}(u;\theta_{0}(u))\|=O_{p}\{(H^{2}{T})^{-1/2}\}$ . In contrast, the local nature of the nonparametric estimator ensures that $\hat{\rho}(u)$ is based on an effective sample of $Th$ observations, which ensures that $\|\hat{\rho}(u)-\rho_{0}(u)\|=O_{p}(1/\sqrt{Th})$ . Therefore, since $\rho_{0}(u)=\rho_{0}(u;\theta_{0}(u))$ ,

[TABLE]

and the dominant order is $O_{p}(1/\sqrt{Th})$ . Indeed, scaling the above by $\sqrt{Th}$ the first term is $O_{p}(\sqrt{h}/H)=o_{p}(1)$ for any $H\geq 1$ , since $h\rightarrow 0$ . Therefore, the term $(1+1/H)$ will not appear in the asymptotic distribution of the L-II estimator.

Intuitively, since L-II uses $TH$ simulated data points and $Th$ “observed data points”, we are in a regime where we have more simulated data than observed data. In particular, since $h\rightarrow 0$ as $T\rightarrow\infty$ , the number of simulated observations, $TH$ , diverges faster than the number of “observed data points”, $Th$ . In parametric II estimation it is well-known that if the number of simulated observations diverges faster than the number of observed data points, the $(1+1/H)$ factor does not appear in the asymptotic variance. $\Box$

Remark 8.

We note that our L-II approach and its asymptotic behavior differ from the “kernel-based” II approach of Billio and Monfort, (2003). In the confines of a fully parametric structural model, Billio and Monfort, (2003) generate a simulated conditional auxiliary criterion function, via kernel smoothing, which is used to construct estimators of the auxiliary parameters. Matching auxiliary estimators based on the observed and simulated data, then “knocks out” the bias of the nonparametric estimator in the asymptotic distribution of the kernel II estimator. As such, in the authors parametric context, the bandwidth used in estimation will have little impact on the behavior of the structural parameter estimates, and, hence, the researcher has liberty to choose this tuning parameter as they see fit. However, unlike Billio and Monfort, (2003), our structural model is nonparametric and we must rely on local simulation (and estimation) of the structural model, in the neighborhood of the time point $u$ . This local approach is required since in our context there is no reason to believe that a global approach will guarantee identification. As a result, we must pay the price for nonparametric estimation, which results in a slower rate of convergence. $\Box$

Remark 9.

As is generally true of nonparametric estimators, the asymptotic distributions given in Theorem 3 will only accurately reflect the sampling properties of the estimator in relatively large samples. As such, in cases with moderate sample sizes, we suggest the use of bootstrap techniques to conduct inference. While the bootstrap theory for locally stationary processes is still evolving, the bootstrap procedures of Paparoditis and Politis, (2002), Dowla et al., (2013), and Kreiss and Paparoditis, (2015) have been shown to be consistent in a wide variety of LS models. $\Box$

4 Simple example

In this section, we consider a simple generalization of the time-varying moving average model that allows the roots of the moving average lag polynomial to be time-varying. After presenting the model, we demonstrate how our L-II approach can be applied to estimate the model and present simulation results on the effectiveness of this strategy.

4.1 MA(1) time-varying parameters

We consider the semiparametric locally stationary MA(1)-process

[TABLE]

We further assume that $\epsilon_{t}$ is a white noise process with mean zero and unit variance, and $E|\epsilon_{t}|^{4+\eta}<\infty$ for any arbitrarily small positive number $\eta$ , and we have that $\sup_{u\in\mathcal{U}}|\theta_{0}(u)|<1$ .

Our goal is to estimate the unknown function $\theta_{0}(\cdot)$ via our L-II approach. In doing so, we approximate (15) by a family of stationary MA(1) processes indexed by $u\in\mathcal{U}$ with some small trimming positive $\delta=o(1)$ ,

[TABLE]

We consider an auxiliary model with the a locally stationary AR(1) structure:

[TABLE]

For fixed $u$ , the auxiliary model is a simple AR(1) model.

Recall that, in parametric MA models, when the roots lie near the region of non-inveribility, the resulting estimators can display a loss in accuracy. Therefore, since for any fixed $u$ , the structural model is well-approximated by a parametric MA(1) model, it is likely that the same issue will be present if $\sup_{u}|\theta_{0}(u)|$ is close to unity.

We use the above auxiliary model to present a L-II estimator of $\theta_{0}(u)$ . Algorithm 1 describes the L-II estimation procedure for (15).

In comparison with the general structure, the time-varying AR(1) auxiliary model in (17) corresponds to taking $z_{u,t}=y_{u,t-1}$ and considering that $g(y_{u,t};\rho)=(y_{u,t}-\rho(u)y_{u,t-1})^{2}$ . Note that it would also be possible to consider additional lags of $y_{u,t}$ in $z_{u,t}$ to accommodate LS-MA models of higher order. It is also useful to note that under weak conditions on the error term, the process $y_{t,T}$ defined in the auxiliary model (17) is strong-mixing; see Orbe et al., (2005).

In this specific model, using the result of Corollary 1, we can deduce the consistency result in Theorem 2 to obtain the following uniform convergence of $\hat{\theta}(u)$ in the LS-MA(1) model to $\theta_{0}(u)$ .333The proof of Corollary 2 follows from Corollary 1, however, for clarity we give a more primitive proof in the Supplementary appendix.

Corollary 2.

Under Assumptions 1-4, if $\sup_{u\in\mathcal{U}}|\rho_{0}(u)|<1$ with uniformly bounded second-derivatives, the estimator $\hat{\theta}(u):=\arg\max_{{\theta}\in\Theta}-\|\hat{\rho}(u)-\hat{\rho}(u;\theta)\|$ satisfies $\sup_{u\in\mathcal{U}}\|\hat{\theta}(u)-\theta_{0}(u)\|=o_{p}(1).$

4.2 Monte Carlo experiments

We demonstrate the usefulness of the L-II approach using a series of Monte Carlo experiments. We consider a sample size of $T=1000$ generated according to the LS-MA(1) model

[TABLE]

Data is generated according to one of three functional specifications for $\theta_{0}(u)$ :

(a)

$\theta_{0}(t/T)=0.5\cdot(t/T)^{2}$ ;

(b)

$\theta_{0}(t/T)=0.25+(t/T)-(t/T)^{2}$ ;

(c)

$\theta_{0}(t/T)=0.5$ .

For inference on $\theta_{0}(\cdot)$ , we use Algorithm 1 with a Gaussian kernel and the rule of thumb bandwidth $h=1.06T^{-1/5}$ . We take $H=2$ for all simulation experiments.444As demonstrated in Theorem 3, the choice of $H$ does not have an asymptotic impact on the estimates. However, in finite samples this choice may affect the estimated values of $\theta_{0}(\cdot)$ , since a larger value of $H$ generally yields a smoother criterion function, and potentially a more accurate optimizer. We estimate $\theta_{0}(\cdot)$ across the grid of points $u\in\{.05,.10,.20,\dots,.90,.95\}$ .

We consider 5,000 replications of the above design across the three different specifications for $\theta_{0}(\cdot)$ . The following three figures illustrate the sampling distribution, across the Monte Carlo replications for each of the three specifications.

Figure 1 demonstrates the ability of the L-II approach to obtain consistent estimators of the unknown function $\theta_{0}(\cdot)$ over $u\in\{.05,.10,.20,\dots,.90,.95\}$ across the three Monte Carlo designs. The bounds are truncated due to the well-known boundary bias associated with local constant nonparametric estimation. We note that, outside of these bounds, given the relatively short nature of the time series, these estimators are likely to be poorly behaved. This issue can be addressed through the use of local-linear smoothing approaches.

5 Time-varying multiplicative stochastic volatility model

The use of stochastic volatility to capture the conditional heteroskedastic movements of asset returns is now commonplace in economics and finance. Recently, however, several authors have suggested that volatility should be decomposed into short and long-run components (see, e.g, Engle and Rangel, , 2008 and Engle et al., , 2013). Such a decomposition has given rise to the class of multiplicative time-varying GARCH models, e.g. Koo and Linton, (2015). Such models decompose volatility into a short-run component, which is conveniently captured via a GARCH model, and a long-run component that slowly varies with larger macroeconomic factors that are captured nonparametrically.

The class of multiplicative GARCH models can capture both short and long-run features, however, it is generally accepted that stochastic volatility models are superior to GARCH models in terms of modeling flexibility and their overall ability to capture fluctuations in short-run volatility. Given this feature, one would suspect that a multiplicative extension of the standard SV model should perform well in many cases. While such a model would be similar to multiplicative GARCH models, the introduction of latent stochastic volatility ensures that direct estimation approaches become infeasible. However, this issue is immaterial for our L-II estimation approach since we can simulate the latent volatility

To this end, in this section we propose a new model where volatility evolves as the product of a short and long-run component: the long-run component is captured by a slowly time-varying function, and the short-run component is captured via an autoregressive SV model. In the context of simulation experiments, we demonstrate that our L-II approach can accurately estimate this new model. We then apply this model to analyze the volatility of monthly returns on twenty-five Fama-French portfolios, with the results indicating that long-run volatility changes dramatically over the sample period under analysis.

Given the general nature of this paper, we leave a thorough discussion on the theoretical properties of this new SV model for future study.

5.1 Model

We now consider a multiplicative extension of the traditional stochastic volatility model. The observed demeaned data is generated according to

[TABLE]

and where

[TABLE]

with $\gamma_{\nu}$ the correlation coefficient between $\nu_{1,t}$ and $\nu_{2,t}$ . In this model, the long-run trend is captured by the deterministic function $\sqrt{\xi({t/T})}$ whereas the short-run dynamics, $h_{t}$ , are represented by the stochastic volatility model. We implicitly assume that $\{Y_{t,T}\}$ changes smoothly over time and if it were not for $\xi(\cdot)$ , the slowly time-varying long-run trend, then $\{Y_{t,T}\}$ would be stationary. That is, we implicitly maintain that $\xi(\cdot)$ is uniformly positive and twice continuously differentiable, and $h_{t}$ is stationary, so that the process $\{Y_{t,T}/\sqrt{\xi(t/T)}\}$ would be stationary. In the supplementary material, we give precise conditions on the function $\xi(\cdot)$ and the remaining parameters that ensure the resulting model is locally stationary.

Directly estimating the structural model (18), and conducting statistical inference on the resulting estimates, is generally infeasible with existing methods. Instead, we propose to conduct inference on the structural model through L-II and by using as our auxiliary model the following locally stationary multiplicative GJR-GARCH model:

[TABLE]

where $z_{t}\overset{iid}{\sim}\mathcal{N}(0,1)$ and $I_{t}=0$ if ${y_{u,t}}/{\sqrt{\tau(u)}}\geq 0$ , and $I_{t}=1$ if ${y_{u,t}}/{\sqrt{\tau(u)}}<0$ . In this setting, we will use the parameters in the auxiliary model, $\rho(\cdot)=(\tau(\cdot),\omega,\alpha,\beta,\gamma)^{\prime}$ , to conduct inference on the parameters of interest in the structural model, $\theta(\cdot)=(\xi(\cdot),\mu,\phi,\gamma_{\nu},\sigma)^{\prime}$ .

Koo and Linton, (2015) demonstrate that locally stationary multiplicative GARCH models can be estimated relatively easily. Note, however, that the symmetry of a GARCH(1,1) model would ensure that it is an unsuitable auxiliary model, as there is no parameter that can be readily matched to the correlation coefficient $\gamma_{\nu}$ . Therefore, we employ the GJR-GARCH(1,1) model so that the leverage effect $\gamma_{\nu}$ is captured by the asymmetry parameter $\gamma$ in the auxiliary model.

5.1.1 Estimation procedure

Before we discuss estimation of the LS-SV model, we note that, due to the multiplicative nature of the model for $Y_{t,T}$ in (18), an additional identification restriction is required in order to identify the unknown parameters. The restriction can be imposed on either the long-run or the short-run part. For instance, while Koo and Linton, (2015) impose a restriction on the long-run component, Engle et al., (2013) impose a restriction on the short-run component. For our L-II, we impose a restriction on the short-run component for the LS-SV model because the L-II is applied over a finite number of fixed time points and therefore, a restriction on the long-run component in the structural model is difficult to implement.

In particular, we impose the restriction that $\mu=0$ for the structural model. Equivalently, for the auxiliary multiplicative GJR-GARCH model, we restrict $\omega=1-\alpha-\beta-\frac{\gamma}{2}$ such that the GJR-GARCH process has unit unconditional variance ( $\frac{\omega}{1-\alpha-\beta-\frac{\gamma}{2}}$ = 1). Under this setup, we conduct our L-II as follows.

Estimation of the auxiliary model:

Using the observations $\{Y_{t,T}\}_{t\leq T}$ , we estimate the auxiliary multiplicative GJR-GARCH model à la Engle and Rangel, (2008) and Koo and Linton, (2015). Specifically, from (19), for $\mathcal{I}_{t}$ denoting the information set at $t$ ,

[TABLE]

under the stationarity of $\sigma^{2}_{t}$ and $z_{t}$ and $\tau^{\ast}(u)=\tau(u)\exp(C)$ with $C=E(\log\sigma^{2}_{t}z_{t}^{2}|\mathcal{I}_{t-1})$ .

We obtain an initial estimate $\log\hat{\tau}^{\ast}(u)$ as

[TABLE]

where $K_{h}(\cdot)=K(\cdot/h)/h$ with a bandwidth $h$ . Once we obtain $\hat{\tau}^{\ast}(u)$ , we calculate the intermediate estimator $\check{\tau}(u)$ :

[TABLE]

because

[TABLE]

when we impose a restriction that $\int_{0}^{1}\tau(u)du=1$ .

Note that the restriction, $\int_{0}^{1}\tau(u)du=1$ is not a model restriction but rather an estimation restriction that can be re-normalized or reconstructed arbitrarily. Once $\check{\tau}(u)$ is obtained, we estimate the GJR-GARCH parameters via maximum likelihood estimation based on the following transformed data $\check{y}_{u,t}=y_{u,t}\big{/}\sqrt{\check{\tau}(u)}$ and obtain the estimators $(\check{\omega},\check{\alpha},\check{\beta},\check{\gamma})^{\prime}$ . However, note that $\check{\rho}=(\check{\tau}(\cdot),\check{\omega},\check{\alpha},\check{\beta},\check{\gamma})^{\prime}$ does not satisfy the restriction $\omega=1-\alpha-\beta-\frac{\gamma}{2}$ . To obtain a vector of parameter estimates that satisfy this restriction, we calculate $\hat{\tau}(u)=\check{\tau}(u)\left({\check{\omega}}/{1-\check{\alpha}-\check{\beta}-\frac{\check{\gamma}}{2}}\right)$ and use $\hat{\tau}(u)$ to construct $\hat{y}_{u,t}=y_{u,t}\big{/}\sqrt{\hat{\tau}(u)}$ . Estimating the parameters in the GJR-GARCH model using the transformed dataset $\{\hat{y}_{u,t}\}_{t\leq T}$ then yields $(\hat{\omega},\hat{\alpha},\hat{\beta},\hat{\gamma})^{\prime}$ . The vector of estimates $\hat{\rho}=(\hat{\tau}(u),\hat{\omega},\hat{\alpha},\hat{\beta},\hat{\gamma})^{\prime}$ is then used in L-II as the auxiliary parameter estimates.555Imposing a restriction in maximum likelihood estimation is usually difficult but we avoid complicated constrained optimization in this way. This restriction or constraint is important for the L-II of this particular model. Another type of constraint is required for another type of structural and auxiliary models for L-II. We believe that imposing a general type of constraint in the context of L-II will open up another important research topic. We leave the analysis of constrained L-II for future research.

Simulation of the structural model:

Based on (18), for a given $u\in\mathcal{U}$ , we simulate $H$ independent structural processes under the restriction $\mu=0$ , for some value of $\theta\in\Theta$ according to:

[TABLE]

with

[TABLE]

In the simulation step, we restrict $\mu=0$ to impose unit unconditional variance for the multiplicative SV model, which is compatible with the restriction on the auxiliary model, $\omega=1-\alpha-\beta-\frac{\gamma}{2}$ .

Estimation of the simulated structural model via the auxiliary model and L-II:

For a given time point $u\in\mathcal{U}$ , based on the simulated data $\{\tilde{y}^{[j]}_{u,t};j=1,...,H\}$ , we first obtain a set of estimators $\{\hat{\rho}^{[j]}(u;\theta)\}_{j=1}^{H}$ . Note that when $\{\hat{\rho}^{[j]}(u;\theta)\}_{j=1}^{H}$ is estimated for each fixed time point, $u$ , the parameter $\tau(u)$ in the auxiliary model is an unknown constant, not a function. This implies that we just estimate the GJR-GARCH model based on the simulated data $\{\tilde{y}^{[j]}_{u,t};j=1,...,H\}$ , to obtain $\{\check{\omega}^{[j]},\check{\alpha}^{[j]},\check{\beta}^{[j]},\check{\gamma}^{[j]}\}_{j=1}^{H}$ and then obtain $\{\hat{\tau}^{[j]}(u)\}_{j=1}^{H}$ , such that $\hat{\tau}^{[j]}(u)=\frac{\check{\omega}}{1-\check{\alpha}-\check{\beta}-\check{\gamma}/2}$ thanks to the restriction $\omega=1-\alpha-\beta-\frac{\gamma}{2}$ . Then we create transformed or normalized data $\hat{y}_{u,t}=\tilde{y}^{[j]}_{u,t}\big{/}\sqrt{\hat{\tau}^{[j]}(u)}$ and obtain $\{\hat{\omega}^{[j]},\hat{\alpha}^{[j]},\hat{\beta}^{[j]},\tilde{\gamma}^{[j]}\}_{j=1}^{H}$ . From $\{\hat{\rho}^{[j]}(u;\theta)\}_{j=1}^{H}$ we can then construct $\hat{\rho}(u;\theta)=\sum_{j=1}^{H}\hat{\rho}^{[j]}(u;\theta)/{H}$ .

Based on $\hat{\rho}(u)$ and $\hat{\rho}(u;\theta)$ , we search for the best candidate for the given time point $u$ and define the estimator $\hat{\theta}(u)$ as the solution to: $\operatorname*{arg\,max}_{\theta\in\Theta}-\|\hat{\rho}(u)-\hat{\rho}(u;\theta)\|^{2}_{\Omega}$ where $\Theta$ is the parameter space for $\theta_{0}(u)$ . The above procedure can then be repeated across a grid of points, say $\{u_{i}\}_{i=1,...,m}$ to estimate the whole functional form of $\theta_{0}(\cdot)$ .

Summing up, Algorithm 2 is employed for the L-II estimation of the locally stationary multiplicative stochastic volatility model.

5.1.2 Monte Carlo experiment

We now conduct a Monte Carlo experiment to illustrate L-II estimation of the locally stationary multiplicative stochastic volatility (LS-SV) model . We fix the sample size to be $T=200$ , and we generate 5000 Monte Carlo replications from the LS-SV model in equation (18) with parameters values given by

[TABLE]

and where the long-run volatility component is given by

[TABLE]

We take as our auxiliary model for this Monte Carlo experiment the LS-GJR-GARCH(1,1) auxiliary model in equation (19).

Similar to the Monte Carlo experiments for the LS-MA(1) model, we estimate the auxiliary parameter via local constant estimation with a Gaussian kernel and rule of thumb bandwidth. We again set the number of simulations to be $H=2$ . For full details of the estimation procedure, please refer to Algorithm 2. Across each Monte Carlo replication we apply the LS-II approach, and record the estimated function $\hat{\xi}(\cdot)$ .666Results for the parametric components of the model are similar to those obtained for other II estimators, and are not presented for the sake of brevity. The estimation results for the unknown function are presented graphically in Figure 2. Similar to the results for the LS-MA(1) model, the LS-II procedure yields good estimates of the unknown function.777Similar to the previous Monte Carlo, we truncate the function estimate due to boundary bias problems associated with the local-constant smoothing approach considered in this implementation.

5.2 Empirical application: LS-SV model

Herein, we analyse the behavior of monthly returns from January 1952 until December 2018 on 25 Fama-French portfolios formed from the intersection of five portfolios on size and five portfolios on book-to-market, and where the breakpoints for the portfolios are taken from the NYSE quintiles and are ordered from smallest to largest.888The data is freely available from Kenneth French’s website. The monthly return series on the Fama-French portfolios covers a long period of observation, and it is unlikely that these series display constant conditional covariance features over the entire sample period. In particular, while it is fairly widely accepted that these portfolios seem to display constant mean dynamics, the large fluctuations in the volatility of these series do not engender confidence that the conditional variance is constant throughout the sample period.999Considering an ARCH test of the demeaned returns for each of the 25 portfolios, where each test uses five lags, returns overwhelming support for the alternative hypothesis across all portfolios. The specific values can be found in the supplementary appendix.

Moreover, given the long time-span over which the data is measured, we argue that it is not realistic to assume that the volatility dynamics that were present in the 1950s have persisted unchanged until 2018. In particular, it is likely that underlying macroeconomic factors would cause these portfolios to exhibit patterns of volatility that display both short-term and long-run fluctuations, which can not be adequately captured by a stationary volatility model. To capture the long-run volatility patterns in the data, we consider a LS-SV version of the Fama-French three factor model. For $r_{t,j},\;j\in\{1,...,25\}$ , denoting excess returns on the $j$ -th portfolio, we assume that $r_{t,j}$ evolves according to

[TABLE]

where $r_{t,m}$ denotes excess returns on the market factor, $\text{SMB}_{t}$ is the size factor, and $\text{HML}_{t}$ is the value factor. We model the short-term volatility component $h_{t,j}$ as

[TABLE]

where we require that the mean of the short-term SV component be zero to ensure the scale of $\xi(\cdot)$ can be properly identified. The above LS-SV model considers that volatility is the composition of two components: a long-run volatility trend that moves slowly and is captured by $\xi_{j}(t/T)$ , and a term, measured by $h_{t,j}$ , that captures short-term fluctuations around $\xi_{j}(t/T)$ .

Estimation in the above LS-SV model can be carried out in two steps: first, we estimate the regression parameters to obtain $\hat{\alpha},\hat{\beta}_{1},\hat{\beta}_{2},\hat{\beta}_{3}$ ; in the second step, the residuals

[TABLE]

are used within the L-II algorithm for the LS-SV model, along with a LS-GJR-GARCH auxiliary model (we refer the reader to Algorithm 2 for specific implementation details). Before moving on, we note that the two-step nature of the L-II approach in this example means that it is straightforward to treat more complicated regression functions, such as, for instance, models with time-varying $\alpha$ and $\beta$ . We refer the interested reader to the supplementary appendix where we consider an alternative specification for the conditional mean function that allows $\alpha,\;\beta$ to be time-varying.101010These results largely mirror those given in the main text, and so we relegate these details to the supplementary material. In particular, we find that time varying versions of $\alpha$ and $\beta$ do not meaningfully deviate from constants for the sample period under analysis.

L-II is used to estimate the short-term and long-run volatility components for all 25 portfolios. However, given the nature of the above estimation approach, uncertainty quantification is carried out using the local block-bootstrap (LBB) of Paparoditis and Politis, (2002). The LBB is operationally similar to the block bootstrap but accounts for the changing stochastic structure of the observation process. Given observed data $y_{1},\dots,y_{T}$ the LBB generates a bootstrapped series of data, $y_{1}^{*},\dots,y_{T}^{*}$ , via the following steps.

•

Select an integer block size $b$ , and a fraction $B\in(0,1]$ such that $T\cdot B$ is an integer.

•

For $\lceil x\rceil$ the smallest integer that is greater than or equal to $x$ , define $q:=\left(\lceil TB\rceil-1\right)$ . For $i=0,1\dots,q$ , let $k_{0},\dots,k_{q}$ be i.i.d. integers generated from the uniform distribution that assigns probability $w(k)=1/(2TB+1)$ to the value $k$ when $-TB\leq k\leq TB$ and zero else.

•

Construct the bootstrap series $y_{1}^{*},\dots,y_{T}^{*}$ by setting $y^{*}_{j+ib}=y_{j+ib+k_{i}}$ for $j=1,\dots,b$ , and where $k_{i}$ is as given above and for $i=0,\dots,q$ .

In the following examples, across each of the 25 portfolios, we implement the LBB using $R=999$ bootstrap replications. Furthermore, we set the LBB block size, $b$ , to be $b=10$ , and take the local bootstrap parameter, $B$ , to be $B\approx 0.11$ .

The estimation results for $\alpha$ and $\beta$ are given in Table 1, and the results for the parametric SV components are given in Table 2. Focusing on the values of $\alpha,\;\beta$ , we see that these estimated parameters are generally statistically significant and have the anticipated signs. Analysing Table 2, we see that the short-term volatility parameters generally have statistically significant autocorrelation coefficients between 0.5 and 0.7, which indicates a moderate amount of short-term volatility persistence. The majority of the estimated values for $\sigma_{v}$ are between 1.5 and 2.0, indicating a relatively large level of noise in the short-term volatility process. Interestingly, none of the estimated leverage effects are statistically significant for the short-term volatility process. To ensure that this insignificance is not an artifact of the chosen auxiliary model, in Table 3 we report 99% confidence intervals for the corresponding LS-GJR-GARCH auxiliary parameter $\gamma$ , which captures the impact of asymmetric news on volatility, and where the confidence intervals are calculated using QMLE sandwich form standard errors. For 24 out of the 25 portfolios, the resulting LS-GJR-GARCH asymmetry parameter is statistically insignificant at the one percent significance level.

Leverage effects account for asymmetric reactions to volatility, possible due to larger macroeconomic forces. By their very nature, these macroeconomic forces are generally slowly varying, and their impact on volatility can then be adequately captured using the time-varying volatility approach considered herein. The insignificance of the estimated leverage effects can then be interpreted as follows: by decomposing volatility into a short-term and long-run component, and by modeling the impact of such macroeconomic forces nonparametrically, the leverage effect is soaked-up by the long-run volatility component; its inclusion in the short-term volatility component is then redundant and, hence, statistically insignificant.

We present the estimates of $\xi(\cdot)$ graphically in Figures 3-7 in the appendix. The reported confidence bounds are the corresponding pointwise, for each value of $u=t/T$ , confidence bounds obtained using the LBB.

The long-run volatility component captures gradual changes in volatility, possibly due to slowly-varying macroeconomic factors that affect returns (see, e.g, Engle and Rangel, , 2008 and Engle et al., , 2013 for a detailed discussion). Given this aim, the results in Figures 3-7 are compelling as they closely align with the larger macroeconomic risk profile of returns over the sample period under analysis. In particular, during the 1950s to the early 1960s most series display relatively low volatility that is either flat or slightly increasing till the early-to-mid 1960s, with the overall trend of most series decreasing after about 1965. This overall trend is then maintained all the way through the great moderation of the 1980s. However, after the end of the great moderation, virtually every series exhibits a significant upswing in long-run volatility. This pattern then continues and culminates around the time of the global financial crisis in the late 2000s, after which there is another sustained decrease in long-run volatility.

Given how well our results correspond to the overarching long-run volatility patters, we note that more than half of these return series now exhibit an additional steeping of long-run volatility. This may indicate that since 2016 we have entered into a new period of long-run macroeconomic volatility.

6 Discussion

We propose a novel indirect inference estimator for locally stationary processes and thereby extend, for the first time, the use of indirect inference estimation to general classes of semiparametric models with slowly time-varying parameters. As part of this study, we also propose a novel local stationary multiplicative stochastic volatility (LS-SV) model. We leave two important topics for future research: the efficiency of the L-II estimator, and the ensuing semiparametric efficiency bound for the class of locally stationary models considered in this paper; and the incorporation of shape restriction for nonparametric estimation within L-II, which may improve efficiency, e.g. Horowitz and Lee, (2017), at the cost of a more complicated estimation approach.

Appendix A Proofs of main results

Proof of Lemma 1.

From the triangle inequality, for all $u_{0}\in\mathcal{U}$ , we have

[TABLE]

where the $O_{p}(T^{-1})$ term follows from Definition 1. Now, consider $\left|y_{t/T,t}-y_{u_{0},t}\right|$ and expand $y_{t/T,t}$ , via (4), in a neighborhood of $u_{0}$ :

[TABLE]

From Assumption 1, in particular the (uniform) bounded second-derivatives of $r(\cdot),\varphi(\cdot),\theta_{0}(\cdot)$ , it follows that $\left|y_{t/T,t}-y_{u_{0},t}\right|=O_{p}\left(\left|{t}/{T}-u_{0}\right|\right)$ . We then have that

[TABLE]

∎

A.1 Proof of Theorem 1

Proof.

Theorem 1 consists of two uniform consistency results: 1. uniform consistency of the auxiliary estimator, $\hat{\rho}(u)$ based on the observed sample $\{Y_{t,T}\}$ to the pseudo-true value $\rho_{0}(u)$ ; 2. uniform consistency of the auxiliary estimator, $\hat{\rho}(u;\theta)$ based on the simulated sample $\{\tilde{y}_{u,t}\}$ to the pseudo-true value $\rho_{0}(u;\theta)$ .

Our proof strategy is twofold. In Part 1, firstly we show that for the true $\theta_{0}(u)$ ,

[TABLE]

which proves the first part as in (12) of Theorem 1.

In Part 2, combined with Part 1, we show the uniform consistency over $u\in\mathcal{U}$ and $\theta\in\Theta$ , i.e. the second part as in (13) of Theorem 1 by using the simulated data and local stationarity.

Part 1: In what follows, we suppress the dependence of $Y_{t,T}$ and $\rho(u)$ on $\theta_{0}$ .

Define

[TABLE]

where $w_{t}(u)=(Th)^{-1}K_{ut}$ , $q(Y_{t,T};f(Z_{t,T},\rho))=(\partial/\partial\rho)g(Y_{t,T};f(Z_{t,T},\rho))$ and $K_{ut}=K\left((u-t/T)/h\right)$ . By construction,

[TABLE]

For an arbitrarily small number $\varepsilon>0$ , let $\|\hat{\rho}(u)-\rho_{0}(u)\|\leq\varepsilon$ . Firstly, we focus on the existence of unique minimizer of $M_{T}(\rho)$ or solution to (22). We consider w.l.o.g. $D$ as a compact $d_{\rho}$ -dimensional set in the vicinity of the origin. We divide $D$ into $N$ disjoint coverings of the form such that $B_{j}=\{\delta:\|\delta-\delta_{j}\|\leq\epsilon_{T}\};j=1,...,N$ for some $\epsilon_{T}>0$ and $\epsilon_{T}=o(1)$ . Since $D$ is compact, it can be covered by a finite number of $B_{j}$ s for $j=1,...,N$ and $N\leq c/\epsilon_{T}$ .

[TABLE]

Due to Assumption 3.(iii) and Assumption 4.(i),

[TABLE]

where $r_{T}=((m_{T}\log T)\big{/}Th)^{1/2}$ . For $\mathcal{S}_{3}$ , in a similar way, for some $\epsilon$ ,

[TABLE]

For $\mathcal{S}_{2}$ ,

[TABLE]

Due to Lemma 3, for some finite numbers, $\epsilon$ , $N$ , $C_{1}$ and $C_{2}$ ,

[TABLE]

Note that $e^{-C_{2}Th/m}<T^{-C_{2}\tau}$ with $\tau\rightarrow\infty$ as $T\rightarrow\infty$ . This implies that

[TABLE]

Combining all the above results with the Borel-Cantelli Lemma yields

[TABLE]

By Assumptions 3.(iii) and 3.(iv), and Assumption 4, for any $\delta\in\mathbb{R}^{d_{\rho}}$ which satisfies that $\Psi_{T}(u,\rho_{0}(u)+\delta)\neq 0$ , $\Psi_{0}(u,\rho_{0}+\delta)\neq 0$ so that (24) implies it with probability approaching to zero for all $u\in\mathcal{U}$ as $T$ tends to infinity. For the uniform consistency, due to Assumption 3.(iii), the strict monotonicity of $q(\cdot)$ at the pseudo-true value, $\rho_{0}$ implies for $u\in\mathcal{U}$ , and for $\iota$ a $d_{\rho}$ dimensional vector of ones,

[TABLE]

where $\Psi_{0}(u,\rho)$ is defined as in (11) and where, for $X\in\mathbb{R}^{d_{\rho}}$ , $[X]_{j}$ denotes the $j$ -th element of the vector. This implies that for all $u\in\mathcal{U}$ , as $T\rightarrow\infty$ ,

[TABLE]

By construction, (25) means that for all $u\in\mathcal{U}$ , w.p.1.,

[TABLE]

due to (22) and $K(\cdot)>0$ in Assumption 4. In combination with Assumption 3.(v), for $\theta_{0}$ , the first part of Theorem 1 as in (12) holds:

[TABLE]

Part 2: We first show that, for a given $u$ , the resulting auxiliary criterion function, based on the observed data, is uniformly well-behaved and close to its limit counterpart. By virtue of the stationary nature of the simulated data, and, in particular, Assumptions 2.(ii) and 3.(v), we show that the same conclusion remains for the simulated criterion function. Lastly, continuity of the simulated objective function, in $\rho$ , and compactness of the parameter spaces, $\Theta$ and $\Gamma$ , can be used to show that $\hat{\rho}(u;\theta)$ is uniformly close to $\rho_{0}(u;\theta)$ in $\theta$ , for all $u\in\mathcal{U}$ , which yields the result. When no confusion will result, we again suppress the dependence of observed quantities on $\theta_{0}$ and simulated quantities on $\theta\in\Theta$ , respectively.

Simplify notation by denoting $g(\rho):=g(Y_{t,T};f(Z_{t,T},\rho))$ and $g(\rho_{0}):=g(Y_{t,T};f(Z_{t,T},\rho_{0}))$ and define $p_{t}(\rho)=\left[g(\rho)-g(\rho_{0})\right]$ . Consider

[TABLE]

Firstly regarding $\mathcal{M}_{1}(\rho)$ , due to Assumptions 3.(i), (ii) and (vi), with dominated convergence theorem, $\mathcal{M}_{1}(\rho)=E[g(\rho)-g(\rho_{0})]$ is continuous at $\rho_{0}(u)$ , $\mathcal{M}_{1}(\rho)$ is nonstochastic and constant with respect to $\mathcal{E}$ . For identifiability, due to Assumption 3.(iv), $|\mathcal{M}_{1}(\rho)|>0$ for all $\rho\in\Gamma$ except for $\rho_{0}$ , i.e. $|\mathcal{M}_{1}(\rho)|>0$ whenever $\rho\neq\rho_{0}(u)$ . This and continuity of $\mathcal{M}_{1}(\rho)$ imply that $\mathcal{M}_{1}(\rho)$ is bounded away from 0 whenever $\rho\in\mathcal{E}^{c}$ , i.e. $\rho$ is outside of a neighborhood of $\rho_{0}(u)$ . Furthermore, by compactness of $\Gamma$ and continuity, $\sup_{\rho\in\Gamma}|\mathcal{M}_{1}(\rho)|<\infty$ .

Meanwhile, with respect to $\mathcal{M}_{2}(\rho)$ , we have two components. Firstly, for the first term of $\mathcal{M}_{2}$ ,

[TABLE]

since, as $T\rightarrow\infty$ , $\frac{1}{Th}\sum_{t=1}^{T}K_{ut}\rightarrow 1$ and $\sup_{\rho\in\Gamma}\left|\mathcal{M}_{1}(\rho)\right|<\infty$ as mentioned previously. For the second term of $\mathcal{M}_{2}(\rho)$ , we need to show

[TABLE]

We discuss two cases: 1) middle part 2) tail part. For some constant $C<\infty$ , let us define $p_{t}^{\ast}(\rho)=p_{t}(\rho)1(|p_{t}(\rho)|\leq C)$ where $1(\cdot)$ is the indicator function and $p_{t}^{\ast\ast}(\rho)=p_{t}(\rho)1(|p_{t}(\rho)|>C)$ or $p_{t}^{\ast\ast}(\rho)=p_{t}(\rho)-p_{t}^{\ast}(\rho)$ .

[TABLE]

For any fixed $\rho$ ,

[TABLE]

which can be arbitrarily small for $C$ and $T$ large enough irrespective of $\rho$ .

For some constant $0<J<C$ such that data is selected via Kernel $(u-t/T)/h\leq J$ ,

[TABLE]

The second term tends to zero as $T\rightarrow\infty$ . For the first term,

[TABLE]

where $\phi(\cdot)$ is the $\phi$ -mixing coefficient defined as in Assumption 2.(i). Due to Assumption 2.(i), the term tends to zero in probability for each fixed $\rho\in\Gamma$ and consequently, $\sup_{\rho\in\Gamma}\left|\mathcal{M}_{2}(\rho)\right|\stackrel{{\scriptstyle p}}{{\rightarrow}}0$ .

From Assumption 2.(ii), it can be directly verified that the above result follows if we replace $\{Y_{t,T}\}$ , $\{Z_{t,T}\}$ and $K_{ut}$ in the above with the simulated counterparts $\{\tilde{y}_{u,t}\}$ , $\{\tilde{z}_{u,t}\}$ and $1$ , respectively (and for any $\theta\in\Theta$ ). From this we conclude, with obvious notations for this simulated counterpart, for any fixed $\theta\in\Theta$ and any fixed $u\in\mathcal{U}$ ,

[TABLE]

where $\tilde{g}(\rho,\theta)=g[\tilde{y}_{u,t};f(\tilde{z}_{u,t},\rho)]$ . Moreover, due to Assumption 3.(iv), the right hand side of the above satisfies, uniformly in $\theta$ , $\sup_{\rho\in\Gamma}|{E}\left[\tilde{g}(\rho,\theta)-\tilde{g}(\rho_{0},\theta)\right]|>0$ , so that, by Assumption 3.(iv) applied to the simulated data, we can conclude that the right hand side is uniquely minimized at $\rho_{0}(\cdot;\theta)$ . The above pointwise convergence, the continuity of $\tilde{M}_{T}(\rho,\theta)$ in $\rho$ , and the compactness of $\Theta$ and $\Gamma$ , allows us to conclude, via the usual equicontinuity arguments (Assumption 3.(v)), that

[TABLE]

Now, using continuity of $\theta\mapsto\rho_{0}(\cdot;\theta)$ , Assumption 3.(vi), conclude that, for any $\delta>0$ there exists some $\varepsilon>0$ such that

[TABLE]

The remainder of the result follows the same lines as Theorem 5.7 in Van der Vaart, (1998), and hence is omitted. ∎

In what follows, we provide Lemma 3 and its proof. For the proof of Lemma 3, we need Lemma 2.

Lemma 2.

Let $\{W_{t,T}\}$ be a triangular array such that

[TABLE]

with $\left|W_{t,T}\right|\leq d$ and $E\left|W_{t,T}\right|\leq\delta$ and $EW_{t,T}^{2}\leq D$ . $\{W_{t,T}\}$ are also $\phi$ -mixing and we denote $\phi(k)$ as the $\phi$ -mixing coefficient such that $\tilde{\phi}(m)=\sum_{j=1}^{m}\phi(j)$ . Let there exist an increasing sequence $m_{T}:T\in\mathbb{N}$ of positive integers such that

[TABLE]

Then, for any positive number $\epsilon$ and $c$ , we have

[TABLE]

where $\phi(m_{T})\rightarrow 0$ as $m_{T}\rightarrow 0$ , $c_{1}=2e^{\frac{3T}{m_{T}}e^{1/2}\phi(m)}$ and $c_{2}=6c^{2}[D+4\delta d\tilde{\phi}(m_{T})]$ .

Proof of Lemma 2.

Define $S=\sum_{t=1}^{T}W_{t,T}$ . Consider a number $n_{0}$ such that $2m(n_{0}-1)\leq T\leq 2mn_{0}$ with $m=m_{T}$ . For all $j=1,2$ and $k=1,...,n_{0}$ , we consider $A_{j,k}=\sum_{t=t_{1}}^{t_{2}}W_{t,T}$ where $t_{1}=\inf[(2k+j-3)m+1,T]$ and $t_{2}=\inf[t_{1}+m-1,T]$ . Note that the size of block for $A_{j,k}$ is $m$ . Then,

[TABLE]

where $B_{j,k}=\sum_{t=1}^{k}A_{j,t}$ for $j=1,2$ with $B_{j,0}=0$ . By construction, for some constant $c$ ,

[TABLE]

From (27), applying (20.28) in Billingsley, (1968, pp 171), we have

[TABLE]

Setting $cmd=1/4$ yields

[TABLE]

This implies that since $e^{x}\leq 1+x+x^{2}$ for $|x|\leq 1/2$ ,

[TABLE]

Moreover, from $1+x\leq e^{x}$ , $1+4c^{2}EA_{j,k}^{2}\leq e^{4c^{2}EA_{j,k}^{2}}$ . Combining the above two inequalities,

[TABLE]

From the definition of $A_{j,k}$ ,

[TABLE]

where the inequality comes from $\left|EW_{t,T}W_{s,T}\right|\leq 2\delta d\phi(|t-s|)$ . With this and (31),

[TABLE]

where $C=[D+4\delta d\tilde{\phi}(m)]$ . In combination with (29) and (30), the inequality leads to

[TABLE]

Iterating the same procedure yields

[TABLE]

Recalling that $n_{0}$ is chosen such that $2m(n_{0}-1)\leq T\leq 2mn_{0}$ , we set $n_{0}\leq\frac{3T}{2m}$ . From (28),

[TABLE]

where $c_{1}=[1+2e^{1/2}\phi(m)]^{\frac{3T}{2m}}=\exp\{\frac{3T}{2m}\log[1+2e^{1/2}\phi(m)]\}\leq\exp\{\frac{3T}{m}e^{1/2}\phi(m)\}$ and $c_{2}=6c^{2}[D+4\delta d\tilde{\phi}(m)]$ . This is due to the fact that $\forall x\geq 0$ , $\log(1+x)\leq x$ . Finally, due to Markov inequality,

[TABLE]

This completes the proof. ∎

Lemma 3.

Under the Assumptions of Theorem 1, for some positive constants, $\epsilon$ , $C_{1}$ and $C_{2}$ ,

[TABLE]

where

[TABLE]

with $\Psi_{T}(u,\rho(u))=\sum_{t=1}^{T}w_{t}(u)g(Y_{t,T};f(Z_{t,T},\rho))$ .

Proof of Lemma 3.

Let $S(\delta_{j}):=\sum_{t=1}^{T}W_{t,T}(\delta_{j})=\Psi_{T}(u,\rho(u)+\delta_{j})-E\Psi_{T}(u,\rho(u)+\delta_{j})$ where $|W_{t,T}(\delta_{j})|\leq d_{j}$ . Under the Assumptions of Theorem 1, $g(\cdot)$ is bounded and the Kernel function satisfies boundedness and Lipschitz continuity. Due to Assumption 2, there exists $m_{T}$ satisfying (26). Setting a constant $c$ proportional to $Th/m_{T}$ , applying Lemma 2 yields that, for some finite positive constants $C_{1}$ and $C_{2}$ ,

[TABLE]

where $\kappa(d_{j},\tilde{\phi}(m_{T})/m_{T})$ is proportional to $c_{2}$ in Lemma 2.

Note that

[TABLE]

From (32),

[TABLE]

which completes the proof. ∎

A.2 Proof of Corollary 1

Proof.

By construction, $\sup_{u\in\mathcal{U};|u-t/T|\leq T^{-1}}|f(Z_{t,T};\rho_{0}(t/T))-f(Z_{t,T};\rho_{0}(u))|=O(T^{-1})$ and therefore $Y_{t,T}=f(Z_{t,T};\rho(t/T))+\eta_{t}=f(Z_{t,T};\rho(u))+\eta_{t}+O(T^{-1})$ . In what follows, $O(T^{-1})$ is suppressed.

Define $p_{t}(\rho)=f(Z_{t,T};\rho)-f(Z_{t,T};\rho_{0})$ for a given $u\in\mathcal{U}$ where $\rho:=\rho(u)\in\Gamma$ and $\rho_{0}:=\rho_{0}(u)$ . Then, we have

[TABLE]

where, for a given $u\in\mathcal{U}$ such that $|u-t/T|\leq T^{-1}$ ,

[TABLE]

Once noting that the absolute summability implies the square summability, everything else is analogous to the proof of Theorem 1. This completes the proof. ∎

A.3 Proof of Theorem 2

Proof.

The proof is similar to others found in the literature on semiparametric estimation, see. e.g., Chen et al., (2003) (pg 1604), and in particular is similar to Lemma 1 in Frazier, (2019) (pg, 136-137).

From the definitions of $\rho_{0}(u)$ and $\rho_{0}(u,;\theta)$ , and the injectivity and continuity of $\rho_{0}(\cdot;\theta)$ , for all $\delta>0,$ there exists some $\epsilon>0$ such that, if $\sup_{u}\|\theta-\theta_{0}(u)\|\geq\delta$ , then

[TABLE]

Applying this fact we see that

[TABLE]

and the results follows if the right hand side of the above is $o_{p}(1)$ .

To this end, first note that, by the definitions of $Q_{T}(u,\theta)$ and $Q_{0}(u,\theta)$ ,

[TABLE]

where the second inequality follows from the reverse triangle inequality and the third from the regular triangle inequality. The uniform convergence now follows from the results in Theorem 1.

Now, we show that for any $\tau>0$

[TABLE]

From the definition of $\hat{\theta}(u)$ , for every $u\in\mathcal{U}$ ,

[TABLE]

and

[TABLE]

Moreover, by uniform convergence of $Q_{T}[u,\theta]$ to $Q_{0}[u,\theta]$ we have,

[TABLE]

Now, consider

[TABLE]

where the last inequality comes from equation (34). Therefore, from the uniform convergence in (35) and (36),

[TABLE]

The result then follows by taking $\tau=\epsilon$ in (33).

∎

A.4 Proof of Theorem 3

We break the proof down into two parts: first, we derive the asymptotic expansion of the estimating equations based on the observed estimator and derives the order of these expansions; we then use this result to deduce the stated result.

Part 1: To simplify notation, in what follows we take $q(Y_{t,T},\rho(u))=q[Y_{t,T};f(Z_{t,T},\rho(u))]$ . By the definition of $\hat{\rho}(u)$ ,

[TABLE]

where $\partial q(x_{0})\big{/}\partial x:=\partial q(x)\big{/}\partial x\big{|}_{x=x_{0}}$ . It can be rewritten as

[TABLE]

where $\partial\Psi_{0}(\rho_{0}(u);u)\big{/}\partial\rho^{\prime}=\lim_{T\rightarrow\infty}(Th)^{-1}\sum_{t=1}^{T}E\left[\partial q(Y_{t,T},\rho_{0}(u))/\partial\rho^{\prime}\right]K((u-t/T)/h)$ .

Firstly, the term, (38).3 is $o_{p}(1)$ given that for each $u\in\mathcal{U}$ and $t\in\mathbb{N}\leq T$ such that $|u-t/T|<T^{-1}$ ,

[TABLE]

due to local stationarity of ${Y_{t,T}}$ and Assumptions 2-5. To see this, for each $u_{0}=t_{0}/T$ ,

[TABLE]

where $Z_{t,T}=\left[\frac{\partial q(Y_{t,T},\rho_{0}(u))}{\partial\rho^{\prime}}-E\left[\frac{\partial q(Y_{t,T},\rho_{0}(u)))}{\partial\rho^{\prime}}\right]\right]$ , and $M=ThL$ with $L$ being the bound of support of a Kernel function as in Assumption 4. Regarding (40).2,

[TABLE]

where $\mathcal{S}_{k}=\sum_{i=1}^{k}Z_{i-t_{0},T}$ and a generic constant $C$ . The first equality comes from summation by parts and the inequality is due to Assumption 4, i.e. $K(\cdot)$ is of bounded variation. The convergence to zero in probability is ensured by the ergodic theorem. Applying the same argument to the term in equation (40).1., the result in equation (39) follows. It is worth noting that similar arguments to (39) are used under various modeling set-ups. For instance, see Lemma A.5 in Dahlhaus and Subba Rao, (2006), Lemma A.1 in Fryzlewicz et al., (2008) and the Proof of Theorem 2 in Koo and Linton, (2012).

From Assumption 5.1, the term (38).1 in (38) satisfies,

[TABLE]

and we can rearrange terms in equation (38) to obtain the result: for $|u-t/T|<T^{-1}$ , apply Assumption 5.5 to obtain

[TABLE]

where $\Psi_{T}(\rho_{0};u)=\frac{1}{Th}\sum_{t=1}^{T}q(Y_{t,T},{\rho}_{0}(u))K\left(\frac{u-t/T}{h}\right)$ .

Part 2: We now use the above expansion to deduce the asymptotic distribution of the L-II estimator.

From the definition of $\hat{\theta}:=\hat{\theta}(u)$ ,

[TABLE]

Note that

[TABLE]

and

[TABLE]

Using equation (42) within the FOCs, and the consistency of $\hat{\rho}(u)$ and $\hat{\theta}(u)$ obtained in Theorem 1 and Theorem 2 respectively, we obtain

[TABLE]

which implies

[TABLE]

where we have used the injectivity of $\rho(u,\theta)$ in $\theta$ . The result now follows by substituting in the expansion for $\{\hat{\rho}(u)-\rho_{0}(u)\}$ given in (41) and multiplying by $\sqrt{Th}$ .

Appendix B Figures and tables

B.1 Figures

B.2 Tables

Supplementary Appendix

Appendix C Proof of Corollary 2

Note that due to $\sup_{u\in\mathcal{U}}|\theta_{0}(u)|<1$ and local stationarity, there exists an invertible moving average process corresponding to the structural model (1) in the vicinity of any given time point $u\in\mathcal{U}$ . Therefore, there exists a Autoregressive process such that

[TABLE]

in the neighborhood of a given time point $u\in\mathcal{U}$ . The auxiliary model AR(1) process is a misspecified version of the above model with a wrong order of lags. Define $M_{T}(\cdot)$ and $M_{0}(\cdot)$ as

[TABLE]

and $M_{0}(\cdot)=\lim_{T\rightarrow\infty}{E}M_{T}(\cdot)$ . $M_{T}(\cdot)$ and $M_{0}(\cdot)$ in this setting are well-defined and well behaved. Furthermore, for any given time point $u\in\mathcal{U}$ , the pseudo-true value, $\rho_{0}(u)$ can be represented by $\rho_{0}(u)={\theta(u)}/(1+{\theta(u)}^{2})$ . The minimizer $\rho_{0}(u)$ for any $u\in\mathcal{U}$ is continuous and strictly monotonic in $\theta(u)$ . All of these ensure that the map $\theta\mapsto\rho_{0}(u;\theta)$ is continuous and injective in $\theta$ and hence Assumptions 3.(vi) is met. Also, the assumption that $\sup_{u\in\mathcal{U}}|\rho_{0}(u)|<1$ implies the compactness of $\Theta$ and $\Gamma$ in conjunction with injectivity. With Lemma 4 that ensures Theorem 1 and Corollary1, the proof of corollary 2 follows directly from verification of the requisite regularity conditions stated in Theorem 2.

Lemma 4.

Under Assumptions 2 and 4, the following are satisfied,

[TABLE]

Proof.

For (44), under Assumptions 2, and 4, note that the following result follows straightforwardly from Theorem 2 in Kristensen, (2009).

[TABLE]

For (45), recall that

[TABLE]

where $\hat{\psi}_{1}(u;\theta(u))=T^{-1}\sum_{t=1}^{T}\tilde{y}_{u,t-1}(u;\theta(u))\tilde{y}_{u,t}(u;\theta(u))$ and $\hat{\psi}_{2}(u;\theta(u))=T^{-1}\sum_{t=1}^{T}\tilde{y}_{u,t-1}^{2}(u;\theta(u))$ . For the sake of notation simplicity, we drop $(u)$ since it is clear that our argument is based on the fixed time point $u$ .

Due to the mean value theorem,

[TABLE]

where $\theta=[-1+\delta,1-\delta]$ and $\bar{\psi}_{k}(\theta)\in[\hat{\psi}_{k}(\theta),{\psi}_{k}(\theta)]<\infty$ for $k=1,2$ . This implies that the uniform convergence rate for the left hand side is determined by $|\hat{\psi}_{1}(\theta)-{\psi}_{1}(\theta)|$ and $|\hat{\psi}_{2}(\theta)-\psi_{2}(\theta)|$ only.

[TABLE]

For A.2, $o_{p}(1)$ by construction. For A.1, we have to show that

[TABLE]

The proof for (47) is organized as follows. Define $Z_{t}(\theta)=\tilde{y}_{u,t-1}(\theta)\tilde{y}_{u,t}(\theta)$ . We replace $Z_{t}(\theta)$ with the truncated process $Z_{t}(\theta)\mathbb{I}(|Z_{t}(\theta)|\leq\gamma_{T})$ where $\mathbb{I}$ is the indicator function and $\gamma_{T}=\tau_{T}^{-1/(k-1)}$ such that $\tau_{T}=\sqrt{\ln T/T}$ for some $k>2$ . Note that $\tau_{T}=o(1)$ . Then, we replace the supremum in (47) with a maximization over a finite $N$ grids. Finally, we use the exponential inequality in Theorem 2.1. in Liebscher, (1996) to bound the remainder.

First, consider truncation of $Z_{t}(\theta)$ .

[TABLE]

where $Z_{t}(\theta)=\tilde{y}_{u,t-1}(\theta)\tilde{y}_{u,t}(\theta)$ . Then,

[TABLE]

Due to Markov’s inequality,

[TABLE]

Therefore, we can focus on $Z_{t}(\theta)\mathbb{I}(|Z_{t}(\theta)|\leq\gamma_{T})$ since replacing $Z_{t}(\theta)$ with $Z_{t}(\theta)\mathbb{I}(|Z_{t}(\theta)|\leq\gamma_{T})$ incurs only an approximation error of order $O_{p}(\tau_{T})$ , which can be made arbitrarily small. In what follows, $|Y_{t}(\theta)|\leq\gamma_{T}$ .

Next, consider a set of grids or coverings of the form such that $B_{j}=\{\theta:\|\theta-\theta_{j}\|\leq\tau_{T}\};j=1,...,N$ . Since $\theta$ is compact, it can be covered by a finite number of $B_{j}$ s for $j=1,...,N$ and $N\leq c/\tau_{T}$ . Note that

[TABLE]

For $\mathcal{S}_{1}$ , due to the assumption of Lipschitz condition and boundedness of the first derivative, $\dot{Z}_{t}(\cdot)$ ,

[TABLE]

For $\mathcal{S}_{3}$ , the similar argument applies and hence

[TABLE]

For $\mathcal{S}_{2}$ , let $T^{-1}\sum_{t=1}^{T}D_{t}(\theta_{j})=\hat{\psi}_{1}(\theta_{j})-{E}\hat{\psi}_{1}(\theta_{j})$ , i.e. $D_{t}(\theta_{j})=Z_{t}(\theta)-{E}Z_{t}(\theta)$

[TABLE]

Here, we apply the result of Theorem 2.1. in Liebscher, (1996) (pg 71) on the strong convergence of sums of dependent strong mixing processes defined as follows with its mixing coefficients $\alpha(k)$ such that for $k>0$ ,

[TABLE]

where $\alpha(k)$ converges exponentially fast to zero as $k\rightarrow\infty$ .111111Note that $\phi$ -mixing in Assumption 2 implies strong mixing and hence it is consistent with Assumption 2. For a stationary zero mean real valued process $M_{t}$ such that $|M_{t}|\leq b_{T}$ with strong mixing coefficients $\alpha_{m}$ ,

[TABLE]

where $\sigma^{2}_{m}={E}\left(\sum_{i=1}^{m}D_{i}\right)^{2}$ . We will use this exponential inequality to prove (47). Set $m=\gamma_{T}^{-1}\tau_{T}^{-1}$ and note that $m<T$ and $m<\varepsilon b/4$ where $\varepsilon=T\tau_{T}$ and $b=\tau_{T}$ for any $\theta$ and sufficiently large $T$ . Also, note that

[TABLE]

From (53),

[TABLE]

where $C$ is a constant and the second term tends to zero due to the assumption on the strong mixing coefficient, which is assumed in the statement of the result. Note that the last bound is independent of $\theta$ , it is the uniform bound. Then,

[TABLE]

Moreover, with sufficiently large strong mixing coefficient decay rate, $\beta$ ,

[TABLE]

Then, the desired result follows from the Borel-Cantelli Lemma. Combining all the results, (51), (52) and (54) proves (47). The proof in relation to $\hat{\psi}_{2}(\theta)$ is similar and hence is omitted. This completes the proof. ∎

Appendix D Additional details for the locally stationary multiplicative SV model example

D.1 Local stationarity

Recall the locally stationary SV (LS-SV) model: for all $u\in[0,1]$ , $0<\xi(u)<\infty$ ,

[TABLE]

where $\nu_{1,t}\sim_{iid}N(0,1)$ . Let $\eta_{t}\sim_{iid}N(0,1)$ , with $\nu_{1,t},\eta_{t}$ independent and define

[TABLE]

Partition the unknown parameter $\theta$ as

[TABLE]

Using these definitions, the LS-SV model can be placed in the general form of the structural model in equation (3):

[TABLE]

and where we have $h_{t}(\theta,\nu_{t})=\mu+\phi h_{t}(\theta,\nu_{t-1})+\nu_{2,t}$ .

Under a weak assumption regarding the growth of $\xi(\cdot)$ , and under compactness for the remaining components of $\theta$ , this model is locally stationary.

Corollary 3.

For all $u_{1},u_{2}\in[0,1]$ , assume that $|\sqrt{\xi(u_{1})}-\sqrt{\xi(u_{2})}|\leq K|u_{1}-u_{2}|$ , with $K$ finite and independent of $u_{1},u_{2}$ . Assume that $\theta_{2}\in\Theta_{2}$ , and $\Theta_{2}$ compact, with $|\rho|\leq 1-\epsilon$ and $|\phi|\leq 1-\epsilon$ for some $\epsilon>0$ . Then, there exists a measurable random variable $C_{t}$ such that, for some $\eta>0$ , $\sup_{t\leq T}E(|C_{t}|^{\eta})<\infty$ , and

[TABLE]

where $C_{T}=\sup_{t\leq T}C_{t}$ .

Proof.

Note that

[TABLE]

Now, for any $u\in[0,1]$ ,

[TABLE]

By the Lipschitz assumption on $\zeta(\cdot)$ , for any $t/T,u$ , for some finite $K$ , and any $u\in[0,1]$ ,

[TABLE]

As a consequence, for any $t\leq T$ ,

[TABLE]

Define $A_{t}=K\left\{\exp(h_{t}/2)\nu_{1,t}\right\}$ . We now demonstrate that $E[A_{t}^{2}]<\infty$ for all $t\geq 1$ . By construction

[TABLE]

By assumption $\nu_{t},\nu_{t-1}$ , for any $t\geq 1$ , and it then follows that

[TABLE]

where $\sigma^{2}_{h}=\sigma^{2}/(1-\phi^{2})$ , and from which we conclude that

[TABLE]

Now, $A_{t}^{2}=\nu_{1,t}^{2}\exp(h_{t})$ and let us calculate $E[A_{t}^{2}]$ ,

[TABLE]

Let $Z_{t}\sim_{iid}\mathcal{N}(0,1)$ , with $Z_{t}\perp\nu_{1,t}$ , for

[TABLE]

the moments of the random variable in (55) are equivalent to those of $\tilde{Z}_{t}-\gamma_{\nu}\sigma_{h}\nu_{1,t}$ . Moreover, since $\nu_{t}\perp\nu_{t-1}$ , it follows that $\tilde{Z}_{t}\perp\nu_{1,t}$ . Using this independence and the log-normality of $\exp(\tilde{Z}_{t})$ , deduce that

[TABLE]

Apply this formula to calculate

[TABLE]

where $M>0$ is finite and does not depend on any parameters.

Taking $C_{t}=A_{t}^{2}$ , for all $t\geq 1$ , the definition is then satisfied with $\eta=1$ .

∎

D.2 Additional empirical results

D.2.1 Alternative conditional mean specification

Recall the locally stationary SV (LS-SV) model considered in Section 5.2: for $r_{t,j}$ denoting excess returns on the $j$ -th portfolio under analysis, $r_{t,j}$ evolves according to

[TABLE]

where the regression function $m_{t,j}$ is given by

[TABLE]

and where $r_{t,m}$ denotes excess returns on the market factor, $\text{SMB}_{t}$ is the size factor, and $\text{HML}_{t}$ is the value factor. The short-run volatility component $h_{t,j}$ is modeled as a AR(1) stochastic volatility process with leverage effects:

[TABLE]

where we fix the mean of the short-run SV component to zero to ensure the scale of $\xi(\cdot)$ is identified.

In this section, we revisit the empirical analysis in Section 5.2 under an alternative specification for the conditional mean. In this section, we consider an LS-SV model where the conditional mean function is a time-varying regression of the following form:

[TABLE]

where $r_{t,m}$ denotes the market factor. Such a model is akin to a time-varying $\alpha,\;\beta$ model with a LS volatility components.

Estimation and inference for this modified LS-SV model is carried out using a similar two-step procedure to that considers in the linear regression version of the model. First, we estimate the time-varying regression parameters to obtain $\hat{\alpha}(t/T),\;\hat{\beta}(t/T)$ ; in the second step, the residuals

[TABLE]

are used within the L-II algorithm for the LS-SV model, along with a LS-GJR-GARCH auxiliary model (see Algorithm 2 for specific implementation details).

We estimate this modified version of the LS-SV model across all 25 portfolios used in Section 5.2. Pointwise confidence intervals are formed for each of the estimated functions using the local block-bootstrap approach discussed in Section 5.2, and where all elements of the LBB discussed in Section 5.2 are replicated for this analysis.

The estimated results for $\beta_{j}(\cdot)$ are given graphically in Figures 8-12 for each portfolio, while the results for $\alpha_{j}(\cdot)$ are given in Figures 13-17. To ensure the results can be easily interpreted, we only present results for $u\in[0.05,0.95]$ . For values of $u$ not in this region, it is well-known that local kernel methods can display significant boundary bias and can be unreliable. Analyzing the estimates we see that across the majority of the sample the estimated functions for both $\alpha_{j}$ and $\beta_{j}$ are nearly constant, and across virtually all of the portfolios. We note, however, that certain of the estimates do appear to display some nonlinear behavior at the beginning and end of the sample. We believe that this is due to the boundary bias exhibited by the local kernel estimation method, and therefore is not genuine nonlinearity. All told, these results demonstrate that considering fixed values of $\alpha(\cdot),\beta(\cdot)$ , as was done in Section 5.2 of the main paper, is likely a tenable empirical specification.

The resulting estimated functions for $\xi_{j}(\cdot)$ are given in Figures 18-22. Largely speaking, the results for $\xi_{j}$ follow a very similar patter to those obtained under the linear regression specification for the conditional mean of returns. Note that there is no reason to expect that the two sets of results should be equivalent, since the results are based on two entirely different sets of residuals.

D.2.2 Estimated Values of $\beta(t/T)$

D.2.3 Figures: $\alpha(t/T)$

D.2.4 Figures: $\xi(t/T)$

D.2.5 ARCH Testing Results

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Altonji et al., (2013) Altonji, J. G., Smith, A. A., and Vidangos, I. (2013). Modeling earnings dynamics. Econometrica , 81(4):1395–1454.
2Billingsley, (1968) Billingsley, P. (1968). Convergence of probability measures . John Wiley & Sons.
3Billio and Monfort, (2003) Billio, M. and Monfort, A. (2003). Kernel-based indirect inference. Journal of Financial Econometrics , 1(3):297–326.
4Bruins et al., (2018) Bruins, M., Duffy, J. A., Keane, M. P., and Smith Jr, A. A. (2018). Generalized indirect inference for discrete choice models. Journal of econometrics , 205(1):177–203.
5Chen et al., (2003) Chen, X., Linton, O., and Van Keilegom, I. (2003). Estimation of semiparametric models when the criterion function is not smooth. Econometrica , 71(5):1591–1608.
6Dahlhaus, (1996) Dahlhaus, R. (1996). On the kullback-leibler information divergence of locally stationary processes. Stochastic Processes and their Applications , 62(1):139–168.
7Dahlhaus, (1997) Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. The annals of Statistics , 25(1):1–37.
8Dahlhaus and Polonik, (2009) Dahlhaus, R. and Polonik, W. (2009). Empirical spectral processes for locally stationary time series. Bernoulli , 15(1):1–39.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Indirect Inference for Locally Stationary Models

Indirect Inference for Locally Stationary Models

Abstract

1 Introduction

2 The model

2.1 Structural models

Definition 1**.**

Assumption 1**.**

Lemma 1**.**

2.2 Auxiliary models and direct estimation

2.3 Estimation of structural parameters

3 Asymptotic behavior of L-II

3.1 Consistency

Assumption 2**.**

Assumption 3**.**

Assumption 4**.**

Remark 1**.**

Theorem 1**.**

Remark 2**.**

Remark 3**.**

Corollary 1**.**

Theorem 2**.**

Remark 4**.**

3.2 Asymptotic distribution

Assumption 5**.**

Remark 5**.**

Theorem 3**.**

Remark 6**.**

Remark 7**.**

Remark 8**.**

Remark 9**.**

4 Simple example

4.1 MA(1) time-varying parameters

Corollary 2**.**

4.2 Monte Carlo experiments

5 Time-varying multiplicative stochastic volatility model

5.1 Model

5.1.1 Estimation procedure

Estimation of the auxiliary model:

Simulation of the structural model:

Estimation of the simulated structural model via the auxiliary model and L-II:

5.1.2 Monte Carlo experiment

5.2 Empirical application: LS-SV model

6 Discussion

Appendix A Proofs of main results

Proof of Lemma 1.

A.1 Proof of Theorem 1

Proof.

Lemma 2**.**

Proof of Lemma 2.

Lemma 3**.**

Proof of Lemma 3.

A.2 Proof of Corollary 1

Proof.

A.3 Proof of Theorem 2

Proof.

A.4 Proof of Theorem 3

Appendix B Figures and tables

B.1 Figures

B.2 Tables

Appendix C Proof of Corollary 2

Lemma 4**.**

Proof.

Appendix D Additional details for the locally stationary multiplicative SV model example

D.1 Local stationarity

Corollary 3**.**

Proof.

D.2 Additional empirical results

D.2.1 Alternative conditional mean specification

D.2.2 Estimated Values of β(t/T)\beta(t/T)β(t/T)

D.2.3 Figures: α(t/T)\alpha(t/T)α(t/T)

D.2.4 Figures: ξ(t/T)\xi(t/T)ξ(t/T)

D.2.5 ARCH Testing Results

Definition 1.

Assumption 1.

Lemma 1.

Assumption 2.

Assumption 3.

Assumption 4.

Remark 1.

Theorem 1.

Remark 2.

Remark 3.

Corollary 1.

Theorem 2.

Remark 4.

Assumption 5.

Remark 5.

Theorem 3.

Remark 6.

Remark 7.

Remark 8.

Remark 9.

Corollary 2.

Lemma 2.

Lemma 3.

Lemma 4.

Corollary 3.

D.2.2 Estimated Values of $\beta(t/T)$

D.2.3 Figures: $\alpha(t/T)$

D.2.4 Figures: $\xi(t/T)$