A Statistical Recurrent Stochastic Volatility Model for Stock Markets

Trong-Nghia Nguyen; Minh-Ngoc Tran; David Gunawan; and R. Kohn

arXiv:1906.02884·econ.EM·January 25, 2022

A Statistical Recurrent Stochastic Volatility Model for Stock Markets

Trong-Nghia Nguyen, Minh-Ngoc Tran, David Gunawan, and R. Kohn

PDF

Open Access

TL;DR

This paper introduces the SR-SV model, combining stochastic volatility models with recurrent neural networks to better capture complex volatility dynamics in stock markets, demonstrating superior forecasting performance.

Contribution

It presents a novel statistical recurrent stochastic volatility model that captures non-linearity and long-memory effects, improving volatility forecasting in financial markets.

Findings

01

Model captures complex volatility effects like non-linearity and long-memory.

02

Demonstrates superior out-of-sample forecast performance.

03

Validated on five international stock index datasets.

Abstract

The Stochastic Volatility (SV) model and its variants are widely used in the financial sector while recurrent neural network (RNN) models are successfully used in many large-scale industrial applications of Deep Learning. Our article combines these two methods in a non-trivial way and proposes a model, which we call the Statistical Recurrent Stochastic Volatility (SR-SV) model, to capture the dynamics of stochastic volatility. The proposed model is able to capture complex volatility effects (e.g., non-linearity and long-memory auto-dependence) overlooked by the conventional SV models, is statistically interpretable and has an impressive out-of-sample forecast performance. These properties are carefully discussed and illustrated through extensive simulation studies and applications to five international stock index datasets: The German stock index DAX30, the Hong Kong stock index HSI50,…

Tables18

Table 1. Table 1: Jeffreys’ scale of interpretation of the Bayes Factor F M 1 , M 2 subscript 𝐹 subscript 𝑀 1 subscript 𝑀 2 F_{M_{1},M_{2}} .

Grade	$F_{M_{1}, M_{2}}$	${log}_{10} F_{M_{1}, M_{2}}$	$ln F_{M_{1}, M_{2}}$	Strength of evidence
0	$< 10^{0}$	$< 0$	$< 0$	Negative (supports $M_{2}$ )
1	$10^{0} - 10^{1 / 2}$	$0.0 - 0.5$	$0.0 - 1.2$	Barely worth mentioning
2	$10^{1 / 2} - 10^{1}$	$0.5 - 1.0$	$1.2 - 2.3$	Substantial
3	$10^{1} - 10^{3 / 2}$	$1.0 - 1.5$	$2.3 - 3.5$	Strong
4	$10^{3 / 2} - 10^{2}$	$1.5 - 2.0$	$3.5 - 4.6$	Very strong
5	$> 10^{2}$	$> 2.0$	$> 4.6$	Decisive

Table 2. Table 2: Prior distributions for the parameters in the SR-SV, SV and N-SV models. The notation 𝒩 𝒩 {\cal N} , I G 𝐼 𝐺 IG and Beta denote the Gaussian, inverse-Gamma and Beta distributions, respectively.

SR-SV		SV		N-SV
Parameter	Prior	Parameter	Prior	Parameter	Prior
$β_{0}$	$𝒩 (0, 0.1)$	$μ$	$𝒩 (0, 25)$	$μ$	$𝒩 (0, 25)$
$\frac{ϕ + 1}{2}$	Beta(20,1.5)	$\frac{ϕ + 1}{2}$	Beta(20,1.5)	$\frac{ϕ + 1}{2}$	Beta(20,1.5)
$σ^{2}$	$I G (2.5, 0.25)$	$σ^{2}$	$I G (2.5, 0.25)$	$σ^{2}$	$I G (2.5, 0.25)$
$β_{1}$	$I G (2.5, 1)$			$δ$	$𝒩 (0, 0.1)$
$α$	Beta(2,2)
$w_{h}, w_{ϕ}$ , $w_{η}$	$𝒩 (0, 0.1)$
$b_{r}, b_{ϕ}$	$𝒩 (0, 0.1)$
$w_{z}$	$I G (2.5, 1)$

Table 3. Table 3: Definition of the predictive scores to measure the out-of-sample performance on simulation and real index data. Here, σ ^ t subscript ^ 𝜎 𝑡 \widehat{\sigma}_{t} is an estimate of the volatility σ t subscript 𝜎 𝑡 \sigma_{t} , T t e s t subscript 𝑇 𝑡 𝑒 𝑠 𝑡 T_{test} is the number of observations in test data D t e s t subscript 𝐷 𝑡 𝑒 𝑠 𝑡 D_{test} and θ ^ ^ 𝜃 \widehat{\theta} is a posterior mean estimate of θ 𝜃 \theta .

Score	Definition	Score	Definition
PPS	$- T_{t e s t}^{- 1} \sum_{D_{t e s t}} \log p (y_{t} \| y_{1 : t - 1}, \hat{θ})$	${MSE}_{1}$	$T_{t e s t}^{- 1} \sum_{D_{t e s t}} {(σ_{t} - {\hat{σ}}_{t})}^{2}$
QLIKE	$T_{t e s t}^{- 1} \sum_{D_{t e s t}} (log ({\hat{σ}}_{t}^{2}) + σ_{t}^{2} {\hat{σ}}_{t}^{- 2})$	${MSE}_{2}$	$T_{t e s t}^{- 1} \sum_{D_{t e s t}} {(σ_{t}^{2} - {\hat{σ}}_{t}^{2})}^{2}$
$R^{2} LOG$	$T_{t e s t}^{- 1} \sum_{D_{t e s t}} {[log (σ_{t}^{2} {\hat{σ}}_{t}^{- 2})]}^{2}$	${MAE}_{1}$	$T_{t e s t}^{- 1} \sum_{D_{t e s t}} \| σ_{t} - {\hat{σ}}_{t} \|$
		${MAE}_{2}$	$T_{t e s t}^{- 1} \sum_{D_{t e s t}} \| σ_{t}^{2} - {\hat{σ}}_{t}^{2} \|$

Table 4. Table 4: Simulation: Data generating process.

Data	Model	Parameters
SIM I	$\begin{matrix} σ_{t}^{2} & = μ + α y_{t - 1}^{2} + β σ_{t - 1}^{2}, t = 2, \dots, T \\ y_{t} & = σ_{t} ϵ_{t}, ϵ_{t} \sim 𝒩 (0, 1), t = 1, \dots, T \end{matrix}$	$σ_{1}^{2} = 0.1$ , $μ = 0.1$
SIM I		$α = 0.07$ , $β = 0.92$
SIM II	$\begin{matrix} h_{t} & = μ + α \frac{{(y_{t - 1}^{2})}^{δ} - 1}{δ} + β h_{t - 1}, t = 2, \dots, T \\ y_{t} & = {(1 + δ h_{t})}^{1 / 2 δ} ϵ_{t}, ϵ_{t} \sim 𝒩 (0, 1), t = 1, \dots, T \end{matrix}$	$h_{1} = 0.1$ , $μ = 0.1$
		$α = 0.15$ , $β = 0.82$
		$δ = 0.9$
SIM III	$\begin{matrix} σ_{t}^{2} & = μ + [1 - β B - (1 - ϕ B) {(1 - B)}^{d}] y_{t}^{2} + β σ_{t - 1}^{2}, t = 2, \dots, T \\ y_{t} & = σ_{t} ϵ_{t}, ϵ_{t} \sim 𝒩 (0, 1), t = 1, \dots, T \end{matrix}$	$σ_{1}^{2} = 0.1$ , $μ = 0.01$
		$ϕ = 0.01$ , $β = 0.5$
		$d = 0.62$

Table 5. Table 5: Simulation: Posterior means of the parameters with the posterior standard deviations in brackets. The last column shows the estimated log marginal likelihood with the Monte Carlo standard errors in brackets, averaged over 10 different runs of the DT-SMC sampler. The asterisks indicate the cases when the Bayes factors strongly support the SR-SV model over the SV model. The marginal likelihood are reported in natural log scale.

	$μ$	$ϕ$	$σ^{2}$	$α$	$β_{0}$	$β_{1}$	$w_{z}$	Mar.llh
SIM I
SV	2.145	0.985	0.019					$- 5100.8$
	(0.237)	(0.004)	(0.003)					(0.131)
SR-SV		0.974	0.020	0.534	0.027	0.388	$- 0.205$	$- 5099.9$
		(0.023)	(0.005)	(0.166)	(0.031)	(0.235)	(0.261)	(0.300)
SIM II
SV	1.050	0.967	0.032					$- 4060.3$
	(0.125)	(0.008)	(0.006)					(0.164)
SR-SV		0.792	0.041	0.515	0.043	0.423	0.530	$- {4057.7}^{*}$
		(0.106)	(0.010)	(0.156)	(0.044)	(0.207)	(0.256)	(0.306)
SIM III
SV	0.134	0.984	0.041					$- 3146.9$
	(0.329)	(0.005)	(0.007)					(0.195)
SR-SV		0.896	0.056	0.645	$- 0.093$	0.325	0.290	$- {3144.2}^{*}$
		(0.035)	(0.013)	(0.240)	(0.045)	(0.132)	(0.115)	(0.316)

Table 6. Table 6: Simulation: Forecast performance of the SR-SV and SV models. In each panel, the bold numbers indicate the best predictive scores and the count indicates the number of times a model has better forecast scores than the other one. Monte Carlo standard errors in brackets, averaged over 10 different runs.

	PPS	${MSE}_{1}$	${MSE}_{2}$	${MAE}_{1}$	${MAE}_{2}$	QLIKE	$R^{2} LOG$	Count
SIM I
SV	2.355	0.480	0.485	0.506	0.540	0.481	0.482	1
	(0.001)	(0.003)	(0.003)	(0.002)	(0.004)	(0.001)	(0.004)
SR-SV	2.357	0.435	0.381	0.342	0.319	0.450	0.395	6
	(0.000)	(0.002)	(0.003)	(0.001)	(0.002)	(0.001)	(0.003)
SIM II
SV	1.881	0.189	0.282	0.309	0.383	0.359	0.512	0
	(0.000)	(0.000)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)
SR-SV	1.878	0.076	0.122	0.139	0.189	0.172	0.284	7
	(0.000)	(0.002)	(0.003)	(0.002)	(0.000)	(0.003)	(0.003)
SIM III
SV	1.722	0.919	0.967	0.707	0.796	0.754	0.529	0
	(0.001)	(0.004)	(0.004)	(0.003)	(0.003)	(0.004)	(0.004)
SR-SV	1.720	0.733	0.775	0.548	0.625	0.588	0.399	7
	(0.000)	(0.004)	(0.003)	(0.002)	(0.003)	(0.003)	(0.002)

Table 7. Table 7: Descriptions of the five index datasets.

	In-sample Period	Out-of-sample Period	$T_{in}$	$T_{out}$
DAX	23 Apr 2004 – 21 Feb 2012	22 Feb 2012 – 05 Feb 2016	2000	1000
HSI	27 Oct 2003 – 28 Nov 2011	29 Nov 2011 – 21 Dec 2015	2000	1000
FCHI	09 Jun 2004 – 22 Mar 2012	23 Mar 2012 – 23 Feb 2016	2000	1000
SPX	27 Feb 2004 – 06 Feb 2012	07 Feb 2012 – 28 Jan 2016	2000	1000
TSX	03 Feb 2004 – 01 Feb 2012	02 Feb 2012 – 27 Jan 2016	2000	1000

Table 8. Table 8: Descriptive statistics for the demeaned returns of the DAX, HSI, FCHI, SPX and TSX datasets. V n ( q ) , q = 10 , 20 and 30 formulae-sequence subscript 𝑉 𝑛 𝑞 𝑞 10 20 and 30 V_{n}(q),\ q=10,\ 20\ \text{and}\ 30 , shows the test statistics of Lo’s modified R/S test of long memory with lag q 𝑞 q . Upper and lower values of the 3 last columns are the Lo’s test statistics for absolute and squared returns, respectively. The asterisks indicate significance at the 5% level.

	Min	Max	Std	Skew	Kurtosis	$V_{n} (10)$	$V_{n} (20)$	$V_{n} (30)$
DAX	$- 7.437$	9.993	1.267	0.115	10.960	${3.226}^{*}$	${2.501}^{*}$	${2.146}^{*}$
DAX						${2.456}^{*}$	${1.926}^{*}$	1.670
HSI	$- 11.616$	12.155	1.186	0.307	17.551	${3.934}^{*}$	${3.030}^{*}$	${2.587}^{*}$
HSI						${2.564}^{*}$	${2.088}^{*}$	${1.844}^{*}$
FCHI	$- 7.215$	6.663	1.132	$- 0.320$	7.383	${3.782}^{*}$	${2.976}^{*}$	${2.575}^{*}$
FCHI						${3.018}^{*}$	${2.453}^{*}$	${2.165}^{*}$
SPX	$- 9.351$	10.220	1.307	$- 0.256$	12.502	${3.188}^{*}$	${2.412}^{*}$	${2.047}^{*}$
SPX						${2.664}^{*}$	${2.040}^{*}$	${1.748}^{*}$
TSX	$- 9.879$	9.194	1.262	$- 0.727$	12.202	${3.558}^{*}$	${2.692}^{*}$	${2.277}^{*}$
TSX						${2.877}^{*}$	${2.199}^{*}$	${1.875}^{*}$

Table 9. Table 9: Applications: Posterior means of the parameters with the posterior standard deviations in brackets. The last column shows the estimated log marginal likelihood with the Monte Carlo standard errors in brackets, averaged over 10 different runs of the DT-SMC sampler. The single and double asterisks indicate the cases when the Bayes factors strongly and very strongly support the SR-SV model over the SV model, respectively. The marginal likelihood are reported in natural log scale.

	$μ$	$ϕ$	$σ^{2}$	$δ$	$α$	$β_{0}$	$β_{1}$	$w_{z}$	Mar.llh
DAX
SV	$- 0.098$	0.979	0.038						$- 2871.3$
	(0.233)	(0.006)	(0.008)						(0.171)
N-SV	$- 0.138$	0.977	0.037	$- 0.198$					$- 2872.4$
	(0.212)	(0.006)	(0.008)	(0.086)					(0.224)
SR-SV		0.863	0.064		0.605	$- 0.117$	0.410	0.397	$- {2868.8}^{*}$
		(0.052)	(0.021)		(0.204)	(0.061)	(0.201)	(0.153)	(0.301)
HSI
SV	$- 0.205$	0.987	0.022						$- 2692.0$
	(0.320)	(0.004)	(0.008)						(0.184)
N-SV	$- 0.366$	0.987	0.021	$- 0.242$					$- 2691.0$
	(0.270)	(0.004)	(0.004)	(0.081)					(0.214)
SR-SV		0.824	0.054		0.784	$- 0.196$	0.536	0.387	$- {2687.8}^{* *}$
		(0.061)	(0.021)		(0.137)	(0.083)	(0.262)	(0.139)	(0.337)
FCHI
SV	$- 0.213$	0.977	0.047						$- 2787.1$
	(0.230)	(0.007)	(0.010)						(0.225)
N-SV	$- 0.217$	0.979	0.041	$- 0.198$					$- 2787.3$
	(0.257)	(0.006)	(0.009)	(0.089)					(0.234)
SR-SV		0.843	0.093		0.780	$- 0.179$	0.449	0.363	$- {2784.2}^{*}$
		(0.049)	(0.027)		(0.197)	(0.070)	(0.199)	(0.134)	(0.326)
SPX
SV	$- 0.228$	0.985	0.034						$- 2748.3$
	(0.344)	(0.005)	(0.006)						(0.201)
N-SV	$- 0.267$	0.9837	0.036	$- 0.121$					$- 2749.4$
	(0.268)	(0.004)	(0.007)	(0.080)					(0.211)
SR-SV		0.844	0.056		0.527	$- 0.180$	0.481	0.373	$- {2745.6}^{*}$
		(0.060)	(0.017)		(0.186)	(0.186)	(0.241)	(0.132)	(0.311)
TSX
SV	$- 0.200$	0.985	0.028						$- 2770.1$
	(0.323)	(0.004)	(0.006)						(0.231)
N-SV	$- 0.249$	0.984	0.029	-0.141					$- 2769.9$
	(0.298)	(0.005)	(0.006)	(0.077)					(0.245)
SR-SV		0.868	0.051		0.697	$- 0.129$	0.414	0.355	$- {2767.2}^{*}$
		(0.056)	(0.015)		(0.195)	(0.071)	(0.201)	(0.141)	(0.347)

Table 10. Table 10: Applications: Model diagnostics of the filtered log volatility and residual ϵ ^ t y subscript superscript ^ italic-ϵ 𝑦 𝑡 \widehat{\epsilon}^{y}_{t} . The LB p-values denote the p-value from the Ljung-Box test with 10 lags.

	Filtered volatility				Residual ${\hat{ϵ}}_{t}^{y}$
	Mean	Std	Kurtosis	Skew	Std	Kurtosis	Skew	LB- ${\hat{ϵ}}_{t}$
DAX
SV	1.531	1.930	24.465	4.075	0.985	2.817	$- 0.215$	0.978
N-SV	1.591	2.290	38.239	5.112	0.982	2.742	$- 0.213$	0.978
LMSV	1.423	1.670	13.734	2.885	0.999	3.584	$- 0.163$	0.887
SR-SV	1.325	1.559	26.276	4.182	0.991	2.819	$- 0.207$	0.983
HSI
SV	1.343	2.084	55.696	6.187	0.966	2.801	$- 0.040$	0.162
N-SV	1.428	3.133	179.820	11.399	0.965	2.776	$- 0.040$	0.226
LMSV	1.271	1.632	22.216	3.720	0.999	3.946	$- 0.028$	0.326
SR-SV	1.101	1.651	50.909	5.821	0.978	2.768	$- 0.052$	0.132
FCHI
SV	1.416	1.563	13.561	2.851	0.982	2.724	$- 0.123$	0.101
N-SV	1.483	1.867	22.140	3.705	0.981	2.690	$- 0.117$	0.103
LMSV	1.287	1.451	11.815	2.645	1.000	3.548	$- 0.058$	0.108
SR-SV	1.167	1.159	13.285	2.756	0.985	2.722	$- 0.136$	0.105
SPX
SV	1.648	2.712	25.696	4.395	0.994	2.769	$- 0.254$	0.136
N-SV	1.729	3.139	33.443	5.010	0.995	2.737	$- 0.251$	0.143
LMSV	1.592	2.810	25.516	4.381	0.999	3.623	$- 0.103$	0.219
SR-SV	1.892	3.862	29.532	4.856	1.002	2.746	$- 0.252$	0.149
TSX
SV	1.543	2.516	27.582	4.641	0.971	2.705	$- 0.311$	0.977
N-SV	1.590	2.846	35.343	5.256	0.971	2.683	$- 0.307$	0.969
LMSV	1.395	2.124	20.325	3.893	0.999	3.353	$- 0.301$	0.915
SR-SV	1.389	2.347	28.167	4.702	0.973	2.659	$- 0.310$	0.961

Table 11. Table 11: Applications: Summary statistics on the one-step-ahead out-of-sample forecast conditional variances σ ^ t 2 subscript superscript ^ 𝜎 2 𝑡 \widehat{\sigma}^{2}_{t} and residual ϵ ^ t subscript ^ italic-ϵ 𝑡 \widehat{\epsilon}_{t} . The LB p-values denote the p-value from the Ljung-Box test with 10 lags.

	Forecast Volatility				Forecast Residual ${\hat{ϵ}}_{t}^{y}$
	Mean	Std	Kurtosis	Skew	Std	Kurtosis	Skew	LB- ${\hat{ϵ}}_{t}$
DAX
SV	1.069	0.580	2.992	0.711	0.997	3.964	$- 0.334$	0.652
N-SV	1.072	0.588	3.464	0.893	0.992	3.937	$- 0.342$	0.659
LMSV	1.083	0.616	3.467	0.879	1.001	3.942	$- 0.310$	0.577
SR-SV	0.943	0.458	3.261	0.879	1.034	3.835	$- 0.328$	0.597
HSI
SV	0.655	0.380	14.099	2.823	0.982	4.353	0.036	0.390
N-SV	0.649	0.408	23.440	3.827	0.981	4.298	0.021	0.379
LMSV	0.740	0.386	9.633	2.065	0.928	4.465	0.057	0.283
SR-SV	0.491	0.215	15.963	3.102	1.091	4.135	$- 0.008$	0.368
FCHI
SV	1.087	0.620	3.796	0.989	0.963	4.590	$- 0.436$	0.734
N-SV	1.089	0.646	4.629	1.249	0.965	4.415	$- 0.432$	0.686
LMSV	1.144	0.633	5.126	1.400	0.971	4.656	$- 0.371$	0.657
SR-SV	0.894	0.434	3.626	0.918	0.995	4.174	$- 0.375$	0.609
SPX
SV	0.675	0.438	9.938	2.135	0.983	3.970	$- 0.456$	0.352
N-SV	0.679	0.453	12.583	2.503	0.978	3.970	$- 0.458$	0.365
LMSV	0.763	0.422	4.993	1.291	0.920	4.075	$- 0.416$	0.547
SR-SV	0.523	0.287	9.978	2.242	1.074	3.797	$- 0.406$	0.450
TSX
SV	0.551	0.314	3.231	0.976	0.960	3.859	$- 0.537$	0.123
N-SV	0.551	0.304	3.412	1.035	0.952	3.811	$- 0.533$	0.105
LMSV	0.541	0.272	3.156	0.798	0.983	4.060	$- 0.583$	0.098
SR-SV	0.500	0.229	3.948	1.234	0.973	3.734	$- 0.502$	0.077

Table 12. Table 12: SPX data: Forecast performance of the SR-SV and benchmark models using different realized measures. In each panel, the bold numbers indicate the best predictive scores.

Measure		PPS	${MSE}_{1}$	${MSE}_{2}$	${MAE}_{1}$	${MAE}_{2}$	QLIKE	$R^{2} LOG$	Count
	SV	1.122	0.103	1.333	0.229	0.392	0.347	0.667	0
		(0.001)	(0.001)	(0.002)	(0.001)	(0.002)	(0.001)	(0.002)
BV	N-SV	1.122	0.103	1.327	0.229	0.392	0.346	0.669	0
		(0.000)	(0.001)	(0.002)	(0.001)	(0.002)	(0.001)	(0.003)
	LMSV		0.125	1.406	0.265	0.451	0.402	0.837	0
	SR-SV	1.113	0.091	1.321	0.209	0.354	0.330	0.572	7
		(0.001)	(0.000)	(0.002)	(0.000)	(0.001)	(0.002)	(0.000)
	SV		0.114	0.830	0.256	0.428	0.323	0.862	0
			(0.000)	(0.002)	(0.001)	(0.001)	(0.001)	(0.003)
MedRV	N-SV		0.114	0.826	0.257	0.428	0.323	0.866	0
			(0.000)	(0.002)	(0.001)	(0.000)	(0.001)	(0.004)
	LMSV		0.140	0.901	0.294	0.488	0.385	1.064	0
	SR-SV		0.102	0.821	0.235	0.389	0.308	0.757	6
			(0.000)	(0.002)	(0.000)	(0.000)	(0.001)	(0.001)
	SV		0.114	0.834	0.256	0.419	0.363	0.915	0
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.002)
RKV	N-SV		0.114	0.829	0.256	0.420	0.361	0.918	0
			(0.000)	(0.002)	(0.000)	(0.000)	(0.000)	(0.001)
	LMSV		0.137	0.904	0.290	0.476	0.405	1.100	0
	SR-SV		0.101	0.822	0.237	0.384	0.345	0.808	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
	SV		0.121	1.864	0.245	0.421	0.331	0.796	0
			(0.000)	(0.002)	(0.001)	(0.002)	(0.000)	(0.002)
RV	N-SV		0.120	1.863	0.246	0.422	0.329	0.799	0
			(0.000)	(0.002)	(0.001)	(0.001)	(0.001)	(0.003)
	LMSV		0.145	1.95	0.283	0.484	0.385	0.983	0
	SR-SV		0.108	1.861	0.224	0.382	0.316	0.692	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.001)	(0.000)

Table 13. Table 13: Applications: Posterior means of the parameters of the LMSV model with the posterior standard deviations in brackets. We also report the estimation of the scale factor κ 𝜅 \kappa and the constant c 𝑐 c .

	$d$	$ϕ$	$σ_{η}^{2}$	$σ_{ξ}^{2}$	$κ$	$c$
DAX	$0.442$	0.708	0.054	4.933	1.974	$- 1.616$
	(0.026)	(0.082)	(0.024)	(0.170)
HSI	$0.431$	0.747	0.052	5.469	2.047	$- 1.526$
	(0.030)	(0.069)	(0.021)	(0.183)
FCHI	$0.434$	0.716	0.062	5.241	1.984	$- 1.562$
	(0.029)	(0.083)	(0.029)	(0.182)
SPX	$0.428$	0.809	0.040	5.488	2.017	$- 1.650$
	(0.035)	(0.060)	(0.016)	(0.177)
TSX	$0.445$	0.714	0.047	4.964	1.918	$- 1.493$
	(0.026)	(0.084)	(0.021)	(0.160)

Table 14. Table 14: Implementation settings of the DT-SMC sampler.

Variable	Description	Value
$K$	Number of annealing levels	10000
$M$	Number of particles	10000
$N$	Number of particles in the particle filter	200
$ρ$	Correlation factor in the CPM algorithm	0.999
$c$	Constant of the ESS threshold	0.800
$N_{CPM}$	Number of CPM moves	20

Table 15. Table 15: DAX data: Forecast performance of the SR-SV and benchmark models using different realized measures. In each panel, the bold numbers indicate the best predictive scores.

Measure		PPS	${MSE}_{1}$	${MSE}_{2}$	${MAE}_{1}$	${MAE}_{2}$	QLIKE	$R^{2} LOG$	Count
	SV	1.368	0.099	0.763	0.234	0.485	0.853	0.423	0
		(0.000)	(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.000)
BV	N-SV	1.368	0.099	0.769	0.234	0.487	0.850	0.422	0
		(0.000)	(0.001)	(0.003)	(0.001)	(0.002)	(0.001)	(0.002)
	LMSV		0.108	0.805	0.245	0.501	0.862	0.435	0
	SR-SV	1.365	0.09	0.745	0.220	0.452	0.847	0.386	7
		(0.000)	(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)
	SV		0.094	0.582	0.237	0.486	0.810	0.443	0
			(0.000)	(0.002)	(0.000)	(0.000)	(0.000)	(0.001)
MedRV	N-SV		0.094	0.585	0.238	0.489	0.810	0.447	0
			(0.001)	(0.004)	(0.001)	(0.002)	(0.001)	(0.002)
	LMSV		0.099	0.612	0.245	0.504	0.825	0.467	0
	SR-SV		0.088	0.580	0.227	0.462	0.807	0.417	6
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)
	SV		0.126	0.994	0.268	0.558	0.858	0.581	0
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
RKV	N-SV		0.126	1.001	0.269	0.559	0.854	0.581	0
			(0.001)	(0.003)	(0.001)	(0.001)	(0.001)	(0.002)
	LMSV		0.135	1.023	0.279	0.578	0.878	0.603	0
	SR-SV		0.116	0.974	0.254	0.516	0.851	0.538	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
	SV		0.107	0.909	0.239	0.501	0.859	0.442	0
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
RV	N-SV		0.108	0.915	0.239	0.503	0.855	0.441	0
			(0.001)	(0.003)	(0.001)	(0.002)	(0.001)	(0.002)
	LMSV		0.115	0.936	0.250	0.527	0.875	0.464	0
	SR-SV		0.098	0.889	0.223	0.462	0.851	0.400	6
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)

Table 16. Table 16: HSI data: Forecast performance of the SR-SV and benchmark models using different realized measures. In each panel, the bold numbers indicate the best predictive scores.

Measure		PPS	${MSE}_{1}$	${MSE}_{2}$	${MAE}_{1}$	${MAE}_{2}$	QLIKE	$R^{2} LOG$	Count
	SV	1.131	0.069	0.497	0.186	0.319	0.359	0.390	1
		(0.000)	(0.000)	(0.001)	(0.000)	(0.001)	(0.001)	(0.001)
BV	N-SV	1.130	0.067	0.499	0.182	0.313	0.357	0.376	0
		(0.000)	(0.000)	(0.001)	(0.000)	(0.001)	(0.001)	(0.002)
	LMSV		0.076	0.516	0.205	0.343	0.370	0.423	0
	SR-SV	1.127	0.060	0.504	0.152	0.261	0.355	0.294	5
		(0.000)	(0.00)	(0.002)	(0.000)	(0.000)	(0.000)	(0.001)
	SV		0.066	0.371	0.191	0.317	0.347	0.435	0
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)
MedRV	N-SV		0.065	0.370	0.188	0.312	0.344	0.423	1
			(0.001)	(0.000)	(0.001)	(0.000)	(0.001)	(0.002)
	LMSV		0.073	0.396	0.207	0.335	0.356	0.469	0
	SR-SV		0.059	0.389	0.164	0.272	0.341	0.338	5
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)
	SV		0.100	0.740	0.230	0.385	0.366	0.665	1
			(0.000)	(0.001)	(0.000)	(0.001)	(0.001)	(0.001)
RKV	N-SV		0.098	0.741	0.226	0.380	0.364	0.648	0
			(0.000)	(0.002)	(0.001)	(0.001)	(0.000)	(0.002)
	LMSV		0.112	0.764	0.246	0.405	0.380	0.744	0
	SR-SV		0.087	0.748	0.194	0.323	0.360	0.519	5
			(0.000)	(0.002)	(0.000)	(0.001)	(0.000)	(0.001)
	SV		0.069	0.522	0.186	0.318	0.367	0.390	1
			(0.000)	(0.001)	(0.000)	(0.001)	(0.001)	(0.001)
RV	N-SV		0.068	0.524	0.181	0.311	0.365	0.374	0
			(0.000)	(0.001)	(0.000)	(0.001)	(0.001)	(0.002)
	LMSV		0.077	0.552	0.204	0.353	0.386	0.419	0
	SR-SV		0.060	0.530	0.150	0.258	0.361	0.291	5
			(0.000)	(0.002)	(0.000)	(0.000)	(0.001)	(0.001)

Table 17. Table 17: FCHI data: Forecast performance of the SR-SV and benchmark models using different realized measures. In each panel, the bold numbers indicate the best predictive scores and the model with highest count of best predictive scores is preferred.

Measure		PPS	${MSE}_{1}$	${MSE}_{2}$	${MAE}_{1}$	${MAE}_{2}$	QLIKE	$R^{2} LOG$	Count
	SV	1.384	0.108	1.076	0.235	0.504	0.863	0.426	0
		(0.000)	(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)
BV	N-SV	1.383	0.108	1.086	0.234	0.507	0.860	0.420	0
		(0.000)	(0.000)	(0.002)	(0.000)	(0.001)	(0.000)	(0.001)
	LMSV		0.118	1.104	0.246	0.527	0.872	0.469	0
	SR-SV	1.381	0.095	1.057	0.210	0.448	0.856	0.354	7
		(0.000)	(0.00)	(0.001)	(0.000)	(0.001)	(0.000)	(0.001)
	SV		0.100	0.670	0.238	0.500	0.833	0.543	0
			(0.000)	(0.001)	(0.000)	(0.001)	(0.001)	(0.002)
MedRV	N-SV		0.100	0.672	0.237	0.501	0.832	0.538	0
			(0.000)	(0.002)	(0.001)	(0.000)	(0.000)	(0.001)
	LMSV		0.112	0.695	0.247	0.513	0.849	0.582	0
	SR-SV		0.090	0.665	0.216	0.452	0.828	0.472	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
	SV		0.158	1.271	0.301	0.624	0.901	0.750	0
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.002)
RKV	N-SV		0.159	1.285	0.301	0.628	0.896	0.745	0
			(0.000)	(0.002)	(0.000)	(0.000)	(0.000)	(0.001)
	LMSV		0.168	1.332	0.315	0.656	0.908	0.815	0
	SR-SV		0.139	1.229	0.275	0.562	0.890	0.645	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
	SV		0.103	0.908	0.232	0.495	0.877	0.411	0
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
RV	N-SV		0.103	0.920	0.232	0.498	0.873	0.404	0
			(0.000)	(0.002)	(0.000)	(0.001)	(0.000)	(0.001)
	LMSV		0.113	0.945	0.245	0.521	0.884	0.449	0
	SR-SV		0.090	0.880	0.209	0.440	0.869	0.340	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)

Table 18. Table 18: TSX data: Forecast performance of the SR-SV and benchmark models using different realized measures. In each panel, the bold numbers indicate the best predictive scores.

Measure		PPS	${MSE}_{1}$	${MSE}_{2}$	${MAE}_{1}$	${MAE}_{2}$	QLIKE	$R^{2} LOG$	Count
	SV	1.004	0.074	0.904	0.192	0.295	0.106	0.542	0
		(0.001)	(0.000)	(0.002)	(0.001)	(0.001)	(0.001)	(0.003)
BV	N-SV	1.003	0.074	0.902	0.192	0.294	0.105	0.543	1
		(0.000)	(0.000)	(0.002)	(0.001)	(0.001)	(0.001)	(0.002)
	LMSV		0.091	0.965	0.217	0.331	0.197	0.676	0
	SR-SV	1.001	0.069	0.900	0.181	0.273	0.107	0.517	6
		(0.000)	(0.00)	(0.001)	(0.000)	(0.001)	(0.001)	(0.001)
	SV		0.074	0.289	0.210	0.310	0.098	1.052	0
			(0.000)	(0.001)	(0.001)	(0.001)	(0.001)	(0.002)
MedRV	N-SV		0.073	0.291	0.210	0.309	0.098	1.044	0
			(0.000)	(0.002)	(0.001)	(0.000)	(0.001)	(0.001)
	LMSV		0.096	0.360	0.239	0.353	0.209	0.839	0
	SR-SV		0.069	0.287	0.201	0.291	0.098	0.985	5
			(0.000)	(0.001)	(0.000)	(0.000)	(0.001)	(0.001)
	SV		0.096	0.349	0.246	0.357	0.134	0.987	0
			(0.000)	(0.001)	(0.000)	(0.001)	(0.000)	(0.002)
RKV	N-SV		0.096	0.347	0.246	0.355	0.134	0.991	0
			(0.000)	(0.001)	(0.001)	(0.000)	(0.001)	(0.001)
	LMSV		0.110	0.392	0.263	0.378	0.220	1.120	0
	SR-SV		0.089	0.341	0.236	0.336	0.131	0.951	6
			(0.000)	(0.001)	(0.000)	(0.000)	(0.000)	(0.001)
	SV		0.087	1.370	0.206	0.319	0.118	0.625	0
			(0.000)	(0.002)	(0.000)	(0.000)	(0.001)	(0.002)
RV	N-SV		0.087	1.368	0.206	0.317	0.117	0.627	1
			(0.000)	(0.002)	(0.000)	(0.001)	(0.001)	(0.002)
	LMSV		0.100	1.418	0.224	0.342	0.195	0.742	0
	SR-SV		0.081	1.363	0.195	0.295	0.119	0.597	5
			(0.000)	(0.002)	(0.000)	(0.000)	(0.001)	(0.001)

Equations104

z_{t}

z_{t}

y_{t}

z_{t}

z_{t}

y_{t}

(1 - B)^{d} Φ (B) z_{t}

(1 - B)^{d} Φ (B) z_{t}

y_{t}

h_{t}

h_{t}

η_{t}

z_{t} ∣ η_{t}

h_{t}=f\Big{(}x_{t},f(x_{t-1},...,f(x_{1},h_{0}))\Big{)},\;\;\;\text{where}\;\;\;f(x_{t},h_{t-1}):=\Psi(w_{x}x_{t}+w_{h}h_{t-1}+b),

h_{t}=f\Big{(}x_{t},f(x_{t-1},...,f(x_{1},h_{0}))\Big{)},\;\;\;\text{where}\;\;\;f(x_{t},h_{t-1}):=\Psi(w_{x}x_{t}+w_{h}h_{t-1}+b),

r_{t}

r_{t}

φ_{t}

h_{t}^{(α_{j})}

h_{t}

h_{t}

η_{t}

z_{t}

y_{t}

z_{t} = β_{0} + β_{1} SRU (η_{t - 1}, z_{t - 1}, h_{t - 1}) + ϕ z_{t - 1} + ϵ_{t}^{η} .

z_{t} = β_{0} + β_{1} SRU (η_{t - 1}, z_{t - 1}, h_{t - 1}) + ϕ z_{t - 1} + ϵ_{t}^{η} .

z_{t} = β_{0} + β_{1} N (η_{t - 1}, w_{z} z_{t - 1}, h_{t - 1}) + ϕ z_{t - 1} + ϵ_{t}^{η},

z_{t} = β_{0} + β_{1} N (η_{t - 1}, w_{z} z_{t - 1}, h_{t - 1}) + ϕ z_{t - 1} + ϵ_{t}^{η},

y_{t} ∣ z_{t} \sim N (0, e^{z_{t}}),

y_{t} ∣ z_{t} \sim N (0, e^{z_{t}}),

z_{t} ∣ z_{1 : t - 1}, h_{t} \sim N (ϕ z_{t - 1} + β_{0} + β_{1} h_{t}, σ^{2}), t \geq 2, z_{1} \sim N (β_{0}, σ^{2}) .

z_{t} ∣ z_{1 : t - 1}, h_{t} \sim N (ϕ z_{t - 1} + β_{0} + β_{1} h_{t}, σ^{2}), t \geq 2, z_{1} \sim N (β_{0}, σ^{2}) .

π (θ) = p (θ ∣ y_{1 : T}) = \frac{p ( y _{1 : T} ∣ θ ) p ( θ )}{p ( y _{1 : T} )},

π (θ) = p (θ ∣ y_{1 : T}) = \frac{p ( y _{1 : T} ∣ θ ) p ( θ )}{p ( y _{1 : T} )},

p (y_{1 : T} ∣ θ) = \int p (y_{1 : T} ∣ z_{1 : T}, θ) p (z_{1 : T} ∣ θ) d z_{1 : T},

p (y_{1 : T} ∣ θ) = \int p (y_{1 : T} ∣ z_{1 : T}, θ) p (z_{1 : T} ∣ θ) d z_{1 : T},

π_{t} (θ) := π_{t} (θ ∣ y_{1 : T}) \propto p (y_{1 : T} ∣ θ, u)^{γ_{t}} p (θ),

π_{t} (θ) := π_{t} (θ ∣ y_{1 : T}) \propto p (y_{1 : T} ∣ θ, u)^{γ_{t}} p (θ),

w_{t}^{j} = W_{t - 1}^{j} \frac{p ( y _{1 : T} ∣ θ _{t - 1}^{j} , u _{t - 1}^{j} ) ^{γ_{t}} p ( θ _{t - 1}^{j} )}{p ( y _{1 : T} ∣ θ _{t - 1}^{j} , u _{t - 1}^{j} ) ^{γ_{t - 1}} p ( θ _{t - 1}^{j} )} = W_{t - 1}^{j} p (y_{1 : T} ∣ θ_{t - 1}^{j}, u_{t - 1}^{j})^{γ_{t} - γ_{t - 1}}, j = 1, ..., M

w_{t}^{j} = W_{t - 1}^{j} \frac{p ( y _{1 : T} ∣ θ _{t - 1}^{j} , u _{t - 1}^{j} ) ^{γ_{t}} p ( θ _{t - 1}^{j} )}{p ( y _{1 : T} ∣ θ _{t - 1}^{j} , u _{t - 1}^{j} ) ^{γ_{t - 1}} p ( θ _{t - 1}^{j} )} = W_{t - 1}^{j} p (y_{1 : T} ∣ θ_{t - 1}^{j}, u_{t - 1}^{j})^{γ_{t} - γ_{t - 1}}, j = 1, ..., M

W_{t}^{j} = \frac{w _{t}^{j}}{\sum _{s = 1}^{M} w _{t}^{s}}, j = 1, ..., M .

W_{t}^{j} = \frac{w _{t}^{j}}{\sum _{s = 1}^{M} w _{t}^{s}}, j = 1, ..., M .

ESS = \frac{1}{\sum _{j = 1}^{M} ( W _{t}^{j} ) ^{2}} .

ESS = \frac{1}{\sum _{j = 1}^{M} ( W _{t}^{j} ) ^{2}} .

min (1, \frac{p ( y _{1 : T} ∣ θ _{t}^{j'} , u _{t}^{j'} ) ^{γ_{t}} p ( θ _{t}^{j'} )}{p ( y _{1 : T} ∣ θ _{t}^{j} , u _{t}^{j} ) ^{γ_{t}} p ( θ _{t}^{j} )} \frac{q ( θ _{t}^{j} ∣ θ _{t}^{j'} )}{q ( θ _{t}^{j'} ∣ θ _{t}^{j} )}),

min (1, \frac{p ( y _{1 : T} ∣ θ _{t}^{j'} , u _{t}^{j'} ) ^{γ_{t}} p ( θ _{t}^{j'} )}{p ( y _{1 : T} ∣ θ _{t}^{j} , u _{t}^{j} ) ^{γ_{t}} p ( θ _{t}^{j} )} \frac{q ( θ _{t}^{j} ∣ θ _{t}^{j'} )}{q ( θ _{t}^{j'} ∣ θ _{t}^{j} )}),

log p (y_{1 : T}) = t = 1 \sum K log (j = 1 \sum M w_{t}^{j}) .

log p (y_{1 : T}) = t = 1 \sum K log (j = 1 \sum M w_{t}^{j}) .

F_{M_{1}, M_{2}} = \frac{p ( y _{1 : T} ∣ M _{1} )}{p ( y _{1 : T} ∣ M _{2} )},

F_{M_{1}, M_{2}} = \frac{p ( y _{1 : T} ∣ M _{1} )}{p ( y _{1 : T} ∣ M _{2} )},

y_{t} = 100 (lo g \frac{P _{t + 1}}{P _{t}} - \frac{1}{T _{P} - 1} i = 1 \sum T_{P} - 1 lo g \frac{P _{i + 1}}{P _{i}}), t = 1, 2, ..., T_{P} - 1,

y_{t} = 100 (lo g \frac{P _{t + 1}}{P _{t}} - \frac{1}{T _{P} - 1} i = 1 \sum T_{P} - 1 lo g \frac{P _{i + 1}}{P _{i}}), t = 1, 2, ..., T_{P} - 1,

σ_{t}^{2} = c \cdot R V_{t} where c = \frac{T _{out}^{- 1} \sum _{t = T_{in} + 1}^{T} ( y _{t} - E ( y _{t} ) ) ^{2}}{T _{out}^{- 1} \sum _{t = T_{in} + 1}^{T} R V _{t}}, t = T_{in} + 1, 2, ..., T,

σ_{t}^{2} = c \cdot R V_{t} where c = \frac{T _{out}^{- 1} \sum _{t = T_{in} + 1}^{T} ( y _{t} - E ( y _{t} ) ) ^{2}}{T _{out}^{- 1} \sum _{t = T_{in} + 1}^{T} R V _{t}}, t = T_{in} + 1, 2, ..., T,

(1 - B)^{d} Φ (B) z_{t}

(1 - B)^{d} Φ (B) z_{t}

x_{t}

γ_{x} (h) = Cov (x_{t}, x_{t + h}) = γ (h) + σ_{ξ}^{2} 1_{h = 0},

γ_{x} (h) = Cov (x_{t}, x_{t + h}) = γ (h) + σ_{ξ}^{2} 1_{h = 0},

ℓ_{W} (β_{x}) = 2 π T^{- 1} k = 1 \sum [T /2] {log f_{β_{x}} (ω_{k}) + \frac{J ( ω _{k} )}{f _{β_{x}} ( ω _{k} )}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Energy Load and Power Forecasting · Complex Systems and Time Series Analysis

Full text

A Statistical Recurrent Stochastic Volatility Model for Stock Markets

T.-N. Nguyen, M.-N. Tran, D. Gunawan, R. Kohn Nguyen and Tran: Discipline of Business Analytics, University of Sydney Business School and ACEMS. Gunawan: School of Mathematics and Applied Statistics, University of Wollongong and ACEMS. Kohn: School of Economics, UNSW Business School and ACEMS.

Abstract

The Stochastic Volatility (SV) model and its variants are widely used in the financial sector while recurrent neural network (RNN) models are successfully used in many large-scale industrial applications of Deep Learning. Our article combines these two methods in a non-trivial way and proposes a model, which we call the Statistical Recurrent Stochastic Volatility (SR-SV) model, to capture the dynamics of stochastic volatility. The proposed model is able to capture complex volatility effects (e.g., non-linearity and long-memory auto-dependence) overlooked by the conventional SV models, is statistically interpretable and has an impressive out-of-sample forecast performance. These properties are carefully discussed and illustrated through extensive simulation studies and applications to five international stock index datasets: The German stock index DAX30, the Hong Kong stock index HSI50, the France market index CAC40, the US stock market index SP500 and the Canada market index TSX250. An user-friendly software package together with the examples reported in the paper are available at https://github.com/vbayeslab.

Keywords. Deep Learning; volatility modelling, recurrent neural networks, financial econometrics.

1 Introduction

The volatility of a financial time series, such as stock returns, is defined as the variance of the returns and serves as a measure of the uncertainty about the returns. The volatility, which is of great interest to financial econometricians, is unobserved and therefore often modelled statistically in order to estimate it. The two model classes most frequently used in volatility modelling are the Generalized Autoregressive Conditional Heteroscedastic (GARCH) models and the Stochastic Volatility (SV) models. The GARCH model (Bollerslev,, 1986) expresses the current volatility, conditional on the previous returns and volatilities, as a deterministic and linear function of the squared returns and the conditional volatilities in the previous time period. The SV model (Taylor,, 1982, 1986), on the other hand, uses a latent stochastic process to model the volatility, which is usually taken as a first order autoregressive process. It is well documented that the GARCH and SV models are able to capture important effects exhibited in the variance of financial returns. For example, the volatilities in financial returns are observed to be highly autocorrelated in certain time periods and exhibit periods of both low and high volatility (Mandelbrot,, 1967). This so-called volatility clustering phenomenon can be modeled by the volatility processes introduced in the GARCH and SV models, making these volatility models widely employed in financial time series modelling.

Although the GARCH and SV models were independently and almost concurrently introduced, the GARCH models were initially more widely adopted as it is much easier to estimate GARCH models than SV models. This is because the likelihood of a GARCH model can be obtained explicitly, while the likelihood of a SV model is intractable as it is an integral over the latent process. However, the conditional variance process of GARCH models is deterministic and hence GARCH models might not capture efficiently the random oscillatory behavior of financial volatility (Nelson,, 1991). SV models are considered as an attractive alternative to GARCH models because they overcome this limitation (Kim et al.,, 1998; Yu,, 2002). Recent advances in Bayesian computation such as particle Markov chain Monte Carlo (PMCMC) (Andrieu et al.,, 2010) allow straightforward estimation and inference for SV models.

Standard SV models still cannot appropriately capture some important features arising in financial volatility. For example, a large amount of both theoretical and empirical evidence indicates that there exists long-range persistence in the volatility process of many financial returns, see, e.g, Lo, (1991), Ding et al., (1993), Crato and de Lima, (1994) and Bollerslev and Mikkelsen, (1996). The long-memory property of a time series implies that the decay of the autocorrelations of the series is slower than exponential. The standard SV model of (Taylor,, 1982) uses an AR(1) process to model the log of the volatility and hence might fail to capture this type of persistence (Breidt et al.,, 1998). Another line of the literature shows strong evidence of non-linear auto-dependence in the volatility process of some stock and currency exchange returns (Kiliç,, 2011) and that the simple linear AR(1) process cannot effectively capture the underlying non-linear volatility dynamics.

Breidt et al., (1998) proposed the Long Memory Stochastic Volatility (LMSV) model to overcome the short-memory limitation of the standard SV model. LMSV uses an ARFIMA process (Granger and Joyeux,, 1980) as an alternative to the AR(1) process to capture the long-memory dependence in the volatility. The empirical evidence in Breidt et al., (1998) suggests that the LMSV model is able to capture the long-memory volatility behaviour in some stock return datasets. However, the literature is unclear about whether the LMSV model can capture non-linear dynamics within the volatility process, because the ARFIMA model is linear. Additionally, it is challenging to estimate the LMSV model as its likelihood is intractable. We are unaware of any available software package that implements the LMSV methodology. In another approach, Yu et al., (2006) introduced a family of non-linear SV (N-SV) models to capture the possible departure from the log transform commonly used in SV models. In the standard SV model, the logarithm of volatility is assumed to follow an AR(1) process; N-SV uses other non-linear transformations, such as the Box-Cox power function, rather than the logarithm. The simulation studies and empirical results on currency exchange and option pricing data in Yu et al., (2006) show that the N-SV model using the Box-Cox transformation is able to detect some interesting effects in the underlying volatility process. The general use of N-SV models requires the user to select an appropriate non-linear transformation for the dataset under consideration, and this might lead to a challenging model selection problem. Neither Breidt et al., (1998) nor Yu et al., (2006) clearly discussed the out-of-sample forecast performance of their LMSV and N-SV models.

Recurrent neural networks (RNN) in the Deep Learning literature have impressive prediction performance and have been successfully deployed in a large number of industrial-level applications (language translation, image captioning, speech synthesis, etc.). The RNN models are well-known for their ability to efficiently capture the long-range memory and non-linear dependence existing within various types of sequential data, and are considered as the state-of-the-art models for many sequence learning problems (Lipton et al.,, 2015). Many researchers and practitioners have used RNN for mean modelling in financial time series analysis, but the general consensus is that these machine learning models do not clearly outperform the traditional time series models such as ARMA and ARIMA (see, e.g., Makridakis et al., (2018) and Zhang, (2003)). Makridakis et al., (2018) note that without careful modifications, Machine Learning models are usually less accurate than the statistical approaches that have been extensively investigated in the financial time series literature. Recently, the idea of using the RNN models to improve the predictive performance of GARCH-type models has also been proposed for volatility modelling. For example, Kim and Won, (2018) use the volatility estimates from several GARCH-type models as inputs to a RNN model, which then non-linearly transforms these inputs to output the final estimate of the volatility. The empirical results on the Korean stock market KOPSI 200 index show a significant improvement of forecast performance of the proposed hybrid model over several GARCH-type benchmark models. However, similar to many engineering-oriented Machine Learning models, Kim and Won,’s model overlooks the interpretation aspect in volatility modelling, which is often of main interest to econometricians. One of the main motivations of our article is to develop deep learning based volatility models that are not only able to produce accurate prediction, but also interpretable and have meaningful in-sample analysis. These models should not overlook the well-established features of traditional econometric models, that are motivated by the well-known stylized facts in financial time series such as volatility clustering and fat tails.

In the SV literature, there is still lack of research using RNN structures to model the stochastic volatility dynamics of financial time series, perhaps because of two reasons. First, it is non-trivial to sensibly incorporate RNN into the statistical volatility models. Simple adaptations of RNN to volatility models easily overlook the important stylized facts exhibited in financial volatility, which are well captured by the AR(1) process in the SV model. It is important to select appropriate RNN structures that are not only able to produce accurate out-of-sample volatility forecast, but also explain well the volatility dynamics. Second, a stochastic volatility model that incorporates a RNN structure into its latent stochastic process is highly sophisticated and thus challenging to estimate.

This paper combines the SV and RNN models in a non-trivial way, and proposes a new model, called the Statistical Recurrent Stochastic Volatility (SR-SV) model. In particular, we use the Statistical Recurrent Unit (SRU) structure of Oliva et al., (2017), which is a special type of RNN models, to capture complex volatility effects overlooked by an AR(1) process in the standard SV model but still retain the essential components of the SV model. This combination allows the SR-SV model to enjoy much of advances from both worlds of deep learning (e.g., flexibility and excellent predictive performance) and econometric volatility modelling (e.g., excellent interpretability of volatility effects). The SR-SV model belongs to the class of parametric state space models whose Bayesian inference can be performed using recent advances in the Sequential Monte Carlo (SMC) and particle MCMC literature (Andrieu and Roberts,, 2009; Andrieu et al.,, 2010; Duan and Fulop,, 2015; Deligiannidis et al.,, 2018). The simulation studies and empirical results on the five stock index datasets demonstrate that the SR-SV model can efficiently capture the potential non-linear and long-memory effects in the underlying volatility dynamics, and provide better out-of-sample forecasts than the standard SV, N-SV and LMSV models. We note that we have tested SR-SV on a wider range of stock returns but only report in the paper the results for five of them, as we constantly observed a similar improvement of the model compared to the other three counterpart. A Matlab software package implementing Bayesian estimation and inference for SR-SV together with the examples reported in this paper are available on github111The link is provided in the unblinded version..

The article is organized as follows. Section 2 briefly reviews the SV and SRU models, and presents the SR-SV model. Section 3 discusses in detail Bayesian estimation and inference for the SR-SV model. Section 4 presents the simulation study and applies the SR-SV model to analyze the five stock index datasets. Section 5 concludes. The Appendix gives details of the implementation and further empirical results.

2 The SR-SV model

2.1 The SV model and its possible weaknesses

Let $y=\{y_{t},\ t=1,...,T\}$ be a series of financial returns. We consider a basic version of SV models (Taylor,, 1982)

[TABLE]

The persistence parameter $\phi$ is assumed to be in $(-1,1)$ to enforce stationarity of both the $z$ and $y$ processes. In this SV model, the log volatility process $z$ is assumed to follow an AR(1) model. It is well documented in the financial econometrics literature that financial time series data often exhibit a long-term auto-dependence, which forces the persistence parameter $\phi$ to be close to 1 (Jacquier et al.,, 1994; Kim et al.,, 1998). Write $p(z|\theta)$ for the density of $z$ given the model parameters $\theta=(\mu,\phi,\sigma^{2})$ and $p(y|z)$ for the density of the data $y$ conditional on $z$ . We can view $p(z|\theta)$ as the prior with $\theta$ being the hyper-parameters and $p(y|z)$ as the likelihood (Jacquier et al.,, 1994). Under this perspective, the SV model (1)-(2) puts non-zero prior mass on AR(1) stochastic processes, and zero or almost-zero mass on stochastic processes that are far from being well approximated by an AR(1). This means that the SV model in (1)-(2) might not be able to capture more complex dynamics in the posterior behavior of the log volatility process $z$ , such as long-term memory or non-linear auto-dependence, and that a more flexible prior distribution should be put on $z$ . We will design such a flexible prior by combining the attractive features from both SV and RNN time series modeling techniques.

Yu et al., (2006) propose a class of non-linearity N-SV models as a variant of SV which allows a more flexible link between the variance $\text{\rm Var}(y_{t}|z_{t})$ and the AR(1) process $z$ . Their N-SV model, using the Box-Cox transformation for $\text{\rm Var}(y_{t}|z_{t})$ , is written as

[TABLE]

where $\delta$ is the auxiliary parameter that measures the degree of non-linearity rather than the log transform. As $\delta\rightarrow 0$ , $(1+\delta z_{t})^{1/2\delta}\rightarrow e^{\frac{1}{2}z_{t}}$ and hence the N-SV model includes the SV model as a special case. The term non-linearity here might cause some confusion, as it does not refer to the non-linear auto-dependence within the log volatility process $z$ , but the non-linearity between $\text{\rm Var}(y_{t}|z_{t})$ and $z_{t}$ .

Breidt et al., (1998) suggest to use an ARFIMA $(p,d,q)$ process (Granger and Joyeux,, 1980; Hosking,, 1981) for the log volatility $z_{t}$ to capture the long-memory auto-dependence exhibited in financial time series. Their LMSV model is written as

[TABLE]

where $\Phi(B)=1-\phi_{1}B-\phi_{2}B^{2}-...-\phi_{p}B^{p}$ , $\Theta(B)=1+\theta_{1}B+\theta_{2}B^{2}+...+\theta_{q}B^{q}$ , and $B$ is a backshift operator, i.e., $B^{s}X_{t}=X_{t-s}$ . To ensure the stationarity and invertability of the log volatility process $z_{t}$ , the fractional integration parameter $d$ is assumed to be in $(-0.5,0.5)$ and the roots of $\Phi(B)$ and $\Theta(B)$ have to lie outside the unit circle.

Another notable line of research in the SV literature is the class of semi-parametric stochastic volatility models that incorporate non-parametric techniques into modelling the conditional distribution of financial returns. For example, the stochastic volatility, Dirichlet process mixture (SV-DPM) model of Jensen and Maheu, (2010) uses a Dirichlet process prior (Ferguson,, 1973) to characterize the conditional distribution of $y_{t}$ . Semi-parametric models are different from the SR-SV model in two important aspects. First, the SV-DPM model is proposed to capture the asymmetries and leptokurtotic behaviors of financial returns, while the SR-SV model focuses on modeling the non-linearity and long-memory auto-dependence in the log-volatility dynamics. Second, the SV-DPM model is a semi-parametric model in the sense that the model cannot be described using a finite number of parameters as it uses a non-parametric prior, e.g. Dirichlet process, to simulate the conditional return and retains the parametric structure, e.g. AR(1), of the log-volatility in the standard SV model. The SR-SV model, on the other hand, is a parametric model with eleven parameters whose mathematical representation will be discussed in Section 2.3. Our article therefore uses parametric models including the standard SV, N-SV and LMSV models as the benchmarks to evaluate the SR-SV model.

2.2 The SRU model

There are at least two approaches to modeling time series data. One approach is to represent time effects explicitly via some simple function, often a linear function, of the lagged values of the time series. This is the mainstream time series data analysis approach in the statistics literature with the well-known models such as AR or ARMA. The alternative approach is to represent time effects implicitly via latent variables, which are designed to store the memory of the dynamics in the data. These latent variables, also called hidden states, are updated in a recurrent and deterministic manner using the information carried over by their values from the previous time steps and the information from the data at the current time step. Recurrent neural networks (RNN), belong to the second category, were first developed in cognitive science and successfully used in computer science and other fields. Another class of models that represent time implicitly is state space models, albeit the recurrent update is stochastic, which are widely used in econometrics and statistics. The SV model discussed in Section 2.1 is an example of state space models.

For the purpose of this section, we denote the time series data as $\{D_{t}=(x_{t},z_{t}),\ t=1,2,...\}$ where $x_{t}$ is the vector of inputs and $z_{t}$ the scalar output. In our article, it is useful to think of $x_{t}$ as scalar; however, the RNN approach is often efficiently used to model multivariate time series. If the time series of interest has the form $\{z_{t},\ t=1,2,...\}$ , it can be written as $\{(x_{t},z_{t}),\ t=2,...\}$ with $x_{t}=z_{t-1}$ . Our goal is to model the conditional distribution $p(z_{t}|x_{t},D_{1:t-1})$ . If the serial dependence structure is ignored, then a feedforward neural network (FNN) can be used to transform the raw input data $x_{t}$ into a set of hidden units $h_{t}$ , also called learned features or summary statistics, for the purpose of explaining or predicting $z_{t}$ . However, this approach is unsuitable for time series data as the time effects or the serial dependence are totally ignored. The main idea behind RNN is to let the set of hidden units $h_{t}$ to feed itself using its lagged value $h_{t-1}$ from the previous time step $t-1$ . Hence, RNN can be best thought of as a FNN that allows a connection of the hidden units to their value from the previous time step, enabling the network to possess memory. Mathematically, this RNN model (Elman,, 1990) is written as

[TABLE]

The model parameters include $w_{x}$ , $w_{h}$ , $b$ , $\beta_{0}$ and $\beta_{1}$ , $\Psi(\cdot)$ is a non-linear activation function, e.g., common choices are the sigmoid $\Psi(z)=1/(1+e^{-z})$ and the tanh $\Psi(z)=(e^{z}-e^{-z})/(e^{z}+e^{-z})$ , and $p(z_{t}|\eta_{t})$ is a probability density depending on the learning task. For example, if $z_{t}$ is continuous, then typically $p(z_{t}|\eta_{t})$ is a Gaussian density with mean $\eta_{t}$ ; if $z_{t}$ is binary, then $z_{t}|\eta_{t}$ follows a Bernoulli distribution with probability $\Psi(\eta_{t})=1/(1+e^{-\eta_{t}})$ . Usually one sets $h_{1}=0$ , i.e. the neural network initially does not have any memory.

Figure 1 illustrates graphically the RNN model (7)-(9). We follow Goodfellow et al., (2016) and use a black square to indicate the delay of a single time step in the circuit diagram (left). The circuit diagram can be interpreted as an unfolded computational graph (right), where each node is associated with a particular time step.

The unfolded graph in Figure 1 suggests that the hidden state at time $t$ is the output of a composite function

[TABLE]

which somewhat resembles a multiplication structure in terms of the weight $w_{h}$ . Consequently, the gradient of $h_{t}$ with respect to the model parameters might either explode or vanish if $t$ is sufficiently large and $w_{h}$ is not equal to 1, and hence making it inefficient for the Simple RNN model to learn in long time series. See Goodfellow et al., (2016) for further explanation.

Many sophisticated RNN structures have been proposed to overcome the aforementioned problem in the Simple RNN model; for example, the Long Short-term Memory model of Hochreiter and Schmidhuber, (1997), the Gated Recurrent Unit of Cho et al., (2014) and the Statistical Recurrent Unit (SRU) of Oliva et al., (2017). The SRU allows the vector of summary statistics $h_{t}$ to traverse through the network using a moving average. We will use the SRU in this paper as its structure and some of its main parameters carry statistical meaning; see Section 2.3. A general SRU structure is mathematically written as

[TABLE]

where $\alpha=(\alpha_{1},...,\alpha_{m})\in(0,1)$ is a vector of moving average weights, and $W_{h}$ , $b_{r}$ , $W_{r}$ , $W_{x}$ and $b_{\varphi}$ are the model parameters. We denote the functional learning structure in (11a)-(11c) as $h_{t}=\text{SRU}(x_{t},h_{t-1})$ , which takes $x_{t}$ - the input data at current time $t$ - and $h_{t-1}$ - the previous output of the SRU - as the input arguments. See Figure 2(a) for the graphical representation of this SRU structure. The moving average structure of the state $h_{t}$ allows the RNN network with SRU units to enjoy some advantages compared to other RNN models. The current state $h_{t}$ is related to the previous state $h_{t-1}$ both directly and indirectly and hence mitigate the problem of multiplying the same quantities multiple times as in the Simple RNN model. The novel architecture of the SRU allows the model to capture long term dependencies in data via simple moving averages.

2.3 The SR-SV model

This section proposes the SR-SV model that combines SV and SRU for financial volatility modelling. The key idea is that we use the SRU structure to capture the complicated effects such as long-term memory and non-linear auto-dependence, in the volatility dynamics that are overlooked by the basic SV models. This leads to a prior distribution for the log volatility process $z$ that is much more flexible than the AR(1) prior (c.f. Section 2.1). Our proposed SR-SV model is as follows

[TABLE]

that is, we use a SRU to model the dynamics of the hidden states $h_{t}$ . Here, $z_{0}$ is the initial value of the log volatility process and a convenient choice of $z_{0}$ is the log of the unconditional variance of the observed series $y$ , i.e., $z_{0}=\log(\text{var}(y))$ . We follow the literature to initialize $h_{1}=0$ as the recurrent units initially have no memory. Figure 3 plots the graphical representation of the SR-SV model. See Appendix A.2 for the fully-written version of the SR-SV model.

We note the following important properties of the SR-SV model. First, the SR-SV model in (12)-(15) retains the measurement equation (15) and the linear part $\phi z_{t-1}$ of the AR(1) process from the standard SV model, and captures the volatility effects not captured by the AR(1) process, e.g. non-linear and long-memory auto-dependence, via the latent state $h_{t}$ of the SRU structure. The log volatility at time $t$ in (14) can be written as

[TABLE]

Therefore, the parameter $\beta_{1}$ characterizes all the effects in the underlying log volatility process $z$ rather than the short-term linear effect captured by the AR(1) process. We refer to $\beta_{1}$ as the non-linearity long-memory coefficient. If $\beta_{1}=0$ and $\epsilon_{1}^{\eta}{\sim}{\cal N}(\beta_{0}/(1-\phi),\sigma^{2}/(1-\phi^{2}))$ , the SR-SV model becomes the SV model (1)-(2) and hence the SV model is a special case of the SR-SV model. We therefore follow the SV literature and assume that $|\phi|<1$ . The $z$ process, and thus the $y$ process of the SR-SV model, is not guaranteed to be stationary unless $\beta_{1}=0$ and $\epsilon_{1}^{\eta}{\sim}{\cal N}(\beta_{0}/(1-\phi),\sigma^{2}/(1-\phi^{2}))$ . Non-stationarity for volatility is often argued to be more realistic in practice (e.g. van Bellegem, (2012)), although it might be mathematically less appealing. The equation in (16) can be further written out as

[TABLE]

where $\mathfrak{N}(\cdot)$ is a non-linear function and $w_{z}$ is the weight corresponding to $z_{t-1}$ ; see the full version of the SR-SV model in the Appendix. If $w_{z}=0$ in (17), then $z_{t}$ only depends linearly on $z_{t-1}$ , therefore this equation indicates that the parameter $w_{z}$ characterizes the serial dependence rather than linearity that the previous log volatility $z_{t-1}$ has on $z_{t}$ . We will analyse $w_{z}$ in more details in Section 4.

Second, Oliva et al., (2017) set the scales $\alpha$ of the SRU model to several pre-specified values to obtain a vector of summary statistics $h_{t}$ at different moving average weights. We, however, treat $\alpha$ as a model parameter and learn it from the data. We note that a higher $\alpha$ weighs more on the historical information while a smaller $\alpha$ puts a more weight on the current information. We show later in the empirical study that this parameter $\alpha$ is able to quantify the existence of the long-memory auto-dependence commonly exhibited in the volatility dynamics of the financial time series.

Third, neural networks are highly flexible but often subject to overfitting, i.e., they have over-confident in-sample fit and bad out-of-sample forecasts. Regularization is often needed to avoid overfitting. Injecting noise into the layers of the network has been found an effective regularization approach in the Machine Learning literature, and seen as a form of data augmentation at multiple levels of abstraction (Sietsma and Dow,, 1991; Poole et al.,, 2014; Goodfellow et al.,, 2016; Dieng et al.,, 2018). In the SR-SV model, by allowing $z_{t-1}$ and $\eta_{t-1}$ to be the inputs of the SRU structure at time $t$ , we inject the noise $\epsilon^{\eta}_{t-1}$ of the volatility process to the input and hidden layers of the SRU. This noise-injecting regularization approach makes the SR-SV model perform well on both in-sample fitting and out-of-sample forecast, even with the simplest specification of the SRU structure where all the $r_{t}$ , $\varphi_{t}$ and $h_{t}$ are scalars. Our SR-SV model can be categorized as a parametric model with the vector of model parameters $\theta$ consisting of eleven parameters: four main parameters $\beta_{0}$ , $\beta_{1}$ , $\phi$ , $\sigma^{2}$ and the parameters in the SRU including $\alpha,w_{h},b_{r},w_{r},w_{\eta},w_{z}$ and $b_{\varphi}$ .

Finally, $\beta_{0}$ plays the role of the scale factor $\tau=e^{\beta_{0}/2}$ for the variance of $y_{t}$ . One could set $\beta_{0}=0$ and modify (15) to $y_{t}=\tau e^{\frac{1}{2}z_{t}}\epsilon^{y}_{t}$ ; however, this parameterization might be less statistically efficient in terms of Bayesian estimation, especially for the parameter $\tau$ (see Kim et al.,, 1998).

It is straightforward to extend the SR-SV model in (12)-(15) by incorporating other advances in the SV literature. For example, we can use a Student’s $t$ distribution instead of a Gaussian for the measurement shock $\epsilon^{y}_{t}$ and take into account the leverage effect by correlating $\epsilon^{y}_{t}$ with the volatility shock $\epsilon_{t}^{\eta}$ . We do not consider these extensions here, however, because using the most basic version makes it easier to understand the strengths and weaknesses of the new model.

3 Bayesian inference

This section discusses Bayesian estimation and inference for the SR-SV model. For a generic sequence $\{x_{t}\}$ we use $x_{i:j}$ to denote the series $(x_{i},...,x_{j})$ . The SR-SV model is a state-space model with the measurement equation

[TABLE]

and the state transition equation

[TABLE]

We are interested in sampling from the posterior distribution of $\theta$

[TABLE]

where $p(y_{1:T}|\theta)$ is the likelihood function, $p(\theta)$ is the prior and $p(y_{1:T})=\int_{\Theta}p(y_{1:T}|\theta)p(\theta)d\theta$ is the marginal likelihood. Recall that the vector of model parameters $\theta$ consists of $\beta_{0}$ , $\beta_{1}$ , $\phi$ , $\sigma^{2}$ and the 7 parameters within the SRU model (11a)-(11c).

The likelihood function in (20) is

[TABLE]

which is computationally intractable for non-linear non-Gaussian state space models like the SV and SR-SV models, but can be estimated unbiasedly by a particle filter (Del Moral,, 2004). Bayesian inference for SR-SV can be performed using recent advances in the Sequential Monte Carlo literature that we present next.

3.1 The Density Tempered Sequential Monte Carlo for the SR-SV model

Duan and Fulop, (2015) propose the Density Tempered Sequential Monte Carlo (DT-SMC) approach to Bayesian inference for state space models where the likelihood is intractable. The DT-SMC sampler generalizes the SMC method of Neal, (2001) and Del Moral et al., (2006) when the likelihood can be computed analytically. In order to sample from the posterior $\pi(\theta)$ , the DT-SMC method first samples a set of $M$ weighted particles $\{W^{j}_{0},\theta_{0}^{j}\}^{M}_{j=1}$ from an easy-to-sample distribution $\pi_{0}(\theta)$ , such as the prior $p(\theta)$ , and then traverses these particles through intermediate distributions $\pi_{t}(\theta),\;\;t=1,...,K$ , which target the posterior distribution $\pi(\theta)$ eventually, i.e. $\pi_{K}(\theta)=\pi(\theta)$ . The DT-SMC method uses the following intermediate distributions

[TABLE]

where the $\gamma_{t}$ is referred to as the level temperature and $0=\gamma_{0}<\gamma_{1}<\gamma_{2}<...<\gamma_{K}=1$ , $\widehat{p}(y_{1:T}|\theta,u)$ is the unbiased estimator of the likelihood $p(y_{1:T}|\theta)$ and $u$ is the set of pseudo random numbers used within a particle filter to estimate the likelihood $p(y_{1:T}|\theta)$ . For the purpose of this paper where it is possible to sample from the prior $p(\theta)$ , we set $\pi_{0}(\theta)=p(\theta)$ . Algorithm 1 summarizes the DT-SMC method for the SR-SV model.

The DT-SMC method consists of three main steps: reweighting, resampling and Markov move. At the begining of SMC iteration $t$ , the set of weighted particles $\{W_{t-1}^{j},\theta_{t-1}^{j}\}^{M}_{j=1}$ that approximate the intermediate distribution $\pi_{t-1}(\theta)$ is reweighted to approximate the target $\pi_{t}(\theta)$ . The efficiency of these weighted particles is often measured by the effective sample size (ESS) (Kass et al.,, 1998; Liu and Chen,, 1998) defined in (25). If the ESS is below a prespecified threshold, the particles are resampled; the resulting equally-weighted resamples, which are now approximate samples from $\pi_{t}(\theta)$ , are then refreshed by a Markov kernel whose invariant distribution is $\pi_{t}(\theta)$ . For example, Duan and Fulop, (2015) uses the pseudo marginal Metropolis-Hastings (PMMH) kernal of Andrieu et al., (2010) with the likelihood estimated unbiasedly by the particle filter in the Markov move step. However, Pitt et al., (2012) suggest that the PMMH approach works efficiently when the variance of the log of the estimated likelihood is around 1. For some state space models like the SR-SV model, a large number of particles might be required to obtain a likelihood estimator with log variance to be around 1, which is computationally inefficient. To tackle this problem, we incorporate the Correlated Pseudo Marginal (CPM) approach of Deligiannidis et al., (2018) into the Markov move step. The CPM method makes the current set of random numbers $u$ and proposal $u^{\prime}$ correlated, and helps reduce the variance of the ratio $\widehat{p}(y_{1:T}|\theta^{\prime},u^{\prime})/\widehat{p}(y_{1:T}|\theta,u)$ in (26), thus leading to a better mixing Markov chain while using less number of particles in the particle filter. Similar to the SMC methods of Del Moral et al., (2006) and Neal, (2001), the DT-SMC method is parallelizable as the particles move independently in the Markov move step, and provides an estimate of the marginal likelihood as a by-product.

In Algorithm 1, we use a random walk proposal for $q(\theta^{\prime}|\theta)$ . We follow Gunawan et al., (2018) and choose the tempering sequence $\gamma_{t}$ adaptively to ensure a sufficient level of particle efficiency by selecting the next value of $\gamma_{t}$ such that ESS stays above a threshold.

3.2 Model choice by marginal likelihood

The marginal likelihood is often used to choose between models via the Bayes factor (Jeffreys,, 1935; Kass and Raftery,, 1995). In order to compare the relative performance between two models $M_{1}$ and $M_{2}$ on a given data $y_{1:T}$ , we can use the Bayes factor defined by

[TABLE]

providing a Bayesian alternative to hypothesis testing. The larger the Bayes factor $F_{M_{1},M_{2}}$ , the stronger $M_{1}$ is supported by the data than $M_{2}$ . Jeffreys, (1961) suggests a scale of interpretation of the Bayes factor $F_{M_{1},M_{2}}$ as listed in Table 3.2. We note that the DT-SMC sampler in the previous section provides an efficient way to compute the marginal likelihood.

Bibliography66

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andersen and Bollerslev, (1998) Andersen, T. G. and Bollerslev, T. (1998). Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review , 39(4):885–905.
2Andersen et al., (2012) Andersen, T. G., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation. Journal of Econometrics , 169(1):75 – 93. Recent Advances in Panel Data, Nonlinear and Nonparametric Models: A Festschrift in Honor of Peter C.B. Phillips.
3Andrieu et al., (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B , 72:1–33.
4Andrieu and Roberts, (2009) Andrieu, C. and Roberts, G. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics , 37:697–725.
5Baillie et al., (1996) Baillie, R. T., Bollerslev, T., and Mikkelsen, H. O. (1996). Fractionally integrated generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 74(1):3 – 30.
6Barndorff-Nielsen et al., (2008) Barndorff-Nielsen, O., Hansen, P., Lunde, A., and Shephard, N. (2008). Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica , 76(6):1481–1536. cited By 491.
7Barndorff-Nielsen and Shephard, (2004) Barndorff-Nielsen, O. E. and Shephard, N. (2004). Power and Bipower Variation with Stochastic Volatility and Jumps. Journal of Financial Econometrics , 2(1):1–37.
8Bollerslev, (1986) Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31(3):307 – 327.