Spectral Density-Based and Measure-Preserving ABC for partially observed   diffusion processes. An illustration on Hamiltonian SDEs

Evelyn Buckwar; Massimiliano Tamborrino; Irene Tubikanec

arXiv:1903.01138·stat.CO·July 8, 2019·Stat. Comput.

Spectral Density-Based and Measure-Preserving ABC for partially observed diffusion processes. An illustration on Hamiltonian SDEs

Evelyn Buckwar, Massimiliano Tamborrino, Irene Tubikanec

PDF

1 Repo

TL;DR

This paper introduces a novel ABC approach for stochastic differential equations that leverages spectral and invariant measure properties, improving inference for complex models like Hamiltonian SDEs, demonstrated on EEG data.

Contribution

It proposes a measure-preserving, property-based ABC method using spectral density and invariant measures, applicable to a broad class of SDEs with invariant distributions.

Findings

01

Effective inference on Hamiltonian SDEs demonstrated with simulated data.

02

Application to real EEG data shows practical utility.

03

Method enhances robustness of ABC in stochastic models.

Abstract

Approximate Bayesian Computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time dependent, real world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the…

Tables1

Table 1. Table 1: Parameters of interest, true parameter values and ABC posterior means

$θ$	$θ^{t}$	${\hat{θ}}_{ABC}$
MP1
$γ$	$1$	$1.004$
$(γ, σ)$	$(1, 2)$	$(0.9995, 1.991)$
MP2
$λ$	$20$	$20.014$
$(λ, γ)$	$(20, 1)$	$(20.005, 1.009)$
$(λ, γ, σ)$	$(20, 1, 2)$	$(20.015, 1.002, 2.011)$

Equations140

d X (t) X (0) = f (t, X (t); θ) d t + G (t, X (t); θ) d W (t) = X_{0}, t \in [0, T] .

d X (t) X (0) = f (t, X (t); θ) d t + G (t, X (t); θ) d W (t) = X_{0}, t \in [0, T] .

Y_{θ} = (Y_{θ} (t))_{t \in [0, T]} = g (X),

Y_{θ} = (Y_{θ} (t))_{t \in [0, T]} = g (X),

E [Y_{θ} (t)] Cov [Y_{θ} (t), Y_{θ} (s)] Var [Y_{θ} (t)] = η_{μ} \in R, := r_{θ} (t, s) = r_{θ} (t - s), s \leq t, = r_{θ} (0) = η_{σ^{2}} \in R^{+} .

E [Y_{θ} (t)] Cov [Y_{θ} (t), Y_{θ} (s)] Var [Y_{θ} (t)] = η_{μ} \in R, := r_{θ} (t, s) = r_{θ} (t - s), s \leq t, = r_{θ} (0) = η_{σ^{2}} \in R^{+} .

π (θ ∣ y) \approx π_{ABC} (θ ∣ y) = π {θ ∣ d (s (y), s (y_{θ})) < ϵ} .

π (θ ∣ y) \approx π_{ABC} (θ ∣ y) = π {θ ∣ d (s (y), s (y_{θ})) < ϵ} .

S_{Y_{θ}} = F {r_{θ}} (ω) = \int_{- \infty}^{\infty} r_{θ} (τ) e^{- iω τ} d τ,

S_{Y_{θ}} = F {r_{θ}} (ω) = \int_{- \infty}^{\infty} r_{θ} (τ) e^{- iω τ} d τ,

s (y_{θ}) := (\hat{S}_{y_{θ}}, \hat{f}_{y_{θ}}) .

s (y_{θ}) := (\hat{S}_{y_{θ}}, \hat{f}_{y_{θ}}) .

\text{IAE}(g_{1},g_{2}):=\int\limits_{\mathbb{R}}\ \Bigl{|}g_{1}(x)-g_{2}(x)\Bigr{|}\ dx\in\mathbb{R}^{+}.

\text{IAE}(g_{1},g_{2}):=\int\limits_{\mathbb{R}}\ \Bigl{|}g_{1}(x)-g_{2}(x)\Bigr{|}\ dx\in\mathbb{R}^{+}.

d (s (y), s (y_{θ})) := IAE (\hat{S}_{y}, \hat{S}_{y_{θ}}) + w \cdot IAE (\hat{f}_{y}, \hat{f}_{y_{θ}}),

d (s (y), s (y_{θ})) := IAE (\hat{S}_{y}, \hat{S}_{y_{θ}}) + w \cdot IAE (\hat{f}_{y}, \hat{f}_{y_{θ}}),

D = median {(IAE (\hat{S}_{y_{k}}, \hat{S}_{y_{θ}}) + w \cdot IAE (\hat{f}_{y_{k}}, \hat{f}_{y_{θ}}))_{k = 1}^{M}}

D = median {(IAE (\hat{S}_{y_{k}}, \hat{S}_{y_{θ}}) + w \cdot IAE (\hat{f}_{y_{k}}, \hat{f}_{y_{θ}}))_{k = 1}^{M}}

π (θ ∣ y) \approx π_{ABC}^{num} (θ ∣ y) := π {θ ∣ d (s (y), s (\tilde{y}_{θ})) < ϵ} .

π (θ ∣ y) \approx π_{ABC}^{num} (θ ∣ y) := π {θ ∣ d (s (y), s (\tilde{y}_{θ})) < ϵ} .

X (t_{i + 1}) = X (t_{i}) + f (t_{i}, X (t_{i}); θ) Δ + G (t_{i}, X (t_{i}); θ) ξ_{i},

X (t_{i + 1}) = X (t_{i}) + f (t_{i}, X (t_{i}); θ) Δ + G (t_{i}, X (t_{i}); θ) ξ_{i},

f (t, X (t); θ) = j = 1 \sum d f^{[j]} (t, X (t); θ), G (t, X (t); θ) = j = 1 \sum d G^{[j]} (t, X (t); θ), d \in N .

f (t, X (t); θ) = j = 1 \sum d f^{[j]} (t, X (t); θ), G (t, X (t); θ) = j = 1 \sum d G^{[j]} (t, X (t); θ), d \in N .

d X (t) = f^{[j]} (t, X (t); θ) d t + G^{[j]} (t, X (t); θ) d W (t),

d X (t) = f^{[j]} (t, X (t); θ) d t + G^{[j]} (t, X (t); θ) d W (t),

(φ_{Δ/2}^{[1]} \circ ... \circ φ_{Δ/2}^{[d - 1]} \circ φ_{Δ}^{[d]} \circ φ_{Δ/2}^{[d - 1]} \circ ... \circ φ_{Δ/2}^{[1]}) (x), x \in R^{n},

(φ_{Δ/2}^{[1]} \circ ... \circ φ_{Δ/2}^{[d - 1]} \circ φ_{Δ}^{[d]} \circ φ_{Δ/2}^{[d - 1]} \circ ... \circ φ_{Δ/2}^{[1]}) (x), x \in R^{n},

X := (Q, P)^{^{'}} = (Q (t), P (t))_{t \in [0, T]}^{^{'}},

X := (Q, P)^{^{'}} = (Q (t), P (t))_{t \in [0, T]}^{^{'}},

Q = (X_{1}, ..., X_{d})^{^{'}} and P = (X_{d + 1}, ..., X_{2d})^{^{'}},

Q = (X_{1}, ..., X_{d})^{^{'}} and P = (X_{d + 1}, ..., X_{2d})^{^{'}},

d X (t) (Q (t) P (t)) = f (X (t); θ) (\nabla_{P} H (Q (t), P (t)) - \nabla_{Q} H (Q (t), P (t)) - 2 Γ_{θ} P (t) + G (Q (t); θ)) d t + G (θ) (O_{d} Σ_{θ}) d W (t) .

d X (t) (Q (t) P (t)) = f (X (t); θ) (\nabla_{P} H (Q (t), P (t)) - \nabla_{Q} H (Q (t), P (t)) - 2 Γ_{θ} P (t) + G (Q (t); θ)) d t + G (θ) (O_{d} Σ_{θ}) d W (t) .

H (Q, P) := \frac{1}{2} (∥ P ∥_{R^{d}}^{2} + ∥ Λ_{θ} Q ∥_{R^{d}}^{2}),

H (Q, P) := \frac{1}{2} (∥ P ∥_{R^{d}}^{2} + ∥ Λ_{θ} Q ∥_{R^{d}}^{2}),

d\begin{pmatrix}Q(t)\\ P(t)\end{pmatrix}=\underbrace{\begin{pmatrix}\nabla_{P}H((t),P(t))\\ -\nabla_{Q}H(Q(t),P(t))-2\Gamma_{\theta}P(t)\end{pmatrix}}_{f^{[1]}(X(t);\theta)}dt+\underbrace{\left(\begin{array}[]{c}\mathbb{O}_{d}\\ \Sigma_{\theta}\end{array}\right)}_{\mathcal{G}^{[1]}(\theta)}dW(t),

d\begin{pmatrix}Q(t)\\ P(t)\end{pmatrix}=\underbrace{\begin{pmatrix}\nabla_{P}H((t),P(t))\\ -\nabla_{Q}H(Q(t),P(t))-2\Gamma_{\theta}P(t)\end{pmatrix}}_{f^{[1]}(X(t);\theta)}dt+\underbrace{\left(\begin{array}[]{c}\mathbb{O}_{d}\\ \Sigma_{\theta}\end{array}\right)}_{\mathcal{G}^{[1]}(\theta)}dW(t),

d (Q (t) P (t)) = f^{[2]} (Q (t); θ) (0_{d} G (Q (t); θ)) d t,

d (Q (t) P (t)) = f^{[2]} (Q (t); θ) (0_{d} G (Q (t); θ)) d t,

d X (t) = A \cdot X (t) d t + B d W (t), t \geq 0,

d X (t) = A \cdot X (t) d t + B d W (t), t \geq 0,

X (t_{i + 1}) = e^{A Δ} \cdot X (t_{i}) + ξ_{i},

X (t_{i + 1}) = e^{A Δ} \cdot X (t_{i}) + ξ_{i},

\dot{C} (t) = A C (t) + C (t) A^{^{'}} + B B^{^{'}},

\dot{C} (t) = A C (t) + C (t) A^{^{'}} + B B^{^{'}},

X(t_{i+1})=X(t_{i})+\left(\begin{array}[]{c}0_{d}\\ \Delta G(Q(t_{i});\theta)\end{array}\right).

X(t_{i+1})=X(t_{i})+\left(\begin{array}[]{c}0_{d}\\ \Delta G(Q(t_{i});\theta)\end{array}\right).

(φ_{Δ/2}^{b} \circ φ_{Δ}^{a} \circ φ_{Δ/2}^{b}) (x), x \in R^{n},

(φ_{Δ/2}^{b} \circ φ_{Δ}^{a} \circ φ_{Δ/2}^{b}) (x), x \in R^{n},

d (Q (t) P (t)) = f^{[1]} (X (t); θ) (\nabla_{P} H (Q (t), P (t)) - \nabla_{Q} H (Q (t), P (t)) - 2 Γ_{θ} P (t)) d t,

d (Q (t) P (t)) = f^{[1]} (X (t); θ) (\nabla_{P} H (Q (t), P (t)) - \nabla_{Q} H (Q (t), P (t)) - 2 Γ_{θ} P (t)) d t,

d\begin{pmatrix}Q(t)\\ P(t)\end{pmatrix}=\underbrace{\begin{pmatrix}0_{d}\\ G(Q(t);\theta)\end{pmatrix}}_{f^{[2]}(Q(t);\theta)}dt+\underbrace{\left(\begin{array}[]{c}\mathbb{O}_{d}\\ \Sigma_{\theta}\end{array}\right)}_{\mathcal{G}^{[2]}(\theta)}dW(t).

d\begin{pmatrix}Q(t)\\ P(t)\end{pmatrix}=\underbrace{\begin{pmatrix}0_{d}\\ G(Q(t);\theta)\end{pmatrix}}_{f^{[2]}(Q(t);\theta)}dt+\underbrace{\left(\begin{array}[]{c}\mathbb{O}_{d}\\ \Sigma_{\theta}\end{array}\right)}_{\mathcal{G}^{[2]}(\theta)}dW(t).

X (t_{i + 1}) = e^{A Δ} \cdot X (t_{i}),

X (t_{i + 1}) = e^{A Δ} \cdot X (t_{i}),

X(t_{i+1})=\left(\begin{array}[]{c}Q({t_{i}})\\ P({t_{i}})+\Delta G(Q(t_{i});\theta)+\Sigma_{\theta}\cdot\xi_{i}\end{array}\right),

X(t_{i+1})=\left(\begin{array}[]{c}Q({t_{i}})\\ P({t_{i}})+\Delta G(Q(t_{i});\theta)+\Sigma_{\theta}\cdot\xi_{i}\end{array}\right),

(φ_{Δ/2}^{c} \circ φ_{Δ}^{d} \circ φ_{Δ/2}^{c}) (x), x \in R^{n},

(φ_{Δ/2}^{c} \circ φ_{Δ}^{d} \circ φ_{Δ/2}^{c}) (x), x \in R^{n},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

massimilianotamborrino/sdbmpABC
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Spectral Density-Based and Measure-Preserving ABC

for partially observed diffusion processes

An illustration on Hamiltonian SDEs

Evelyn Buckwar, Massimiliano Tamborrino, Irene Tubikanec

Institute for Stochastics

Johannes Kepler University Linz, Austria

Abstract

Approximate Bayesian Computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time dependent, real world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g., the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography (EEG) data. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.

Keywords

Approximate Bayesian Computation, Likelihood-free inference, Stochastic differential equations, Numerical splitting schemes, Invariant measure, Neural mass models

Acknowledgements

This research was partially supported by the Austrian Science Fund (FWF): W1214-N15, project DK14.

1 Introduction

Over the last decades, SDEs have become an established and powerful tool for modelling time dependent, real world phenomena with underlying random effects. They have been successfully applied to a variety of scientific fields, ranging from biology over finance, to physics, chemistry, neuroscience and others. Diffusion processes obtained as solutions of SDEs are typically characterised by some underlying structural properties whose investigation and preservation is crucial. Examples are boundary properties, symmetries or the preservation of invariants or qualitative behaviour such as the ergodicity or the conservation of energy. Here, we focus on a specific structural property, namely the existence of a unique invariant measure. Besides the modelling, it is of primary interest to estimate the underlying model parameters. This is particularly difficult when the multivariate stochastic process is only partially observed through a $1$ -dimensional function of its coordinates (the output process), a scenario that we tackle here. Moreover, due to the increasing complexity of SDEs, needed to understand and reproduce the real data, the underlying likelihood is often unknown or intractable. Among several likelihood-free inference approaches, we focus on the simulation-based ABC method. We refer to Marin et al. (2012) and to the recently published book “Handbook of Approximate Bayesian Computation” for an exhaustive discussion (Sisson et al. 2018).

ABC has become one of the major tools for parameter inference in complex mathematical models in the last decade. The method is based on the idea of deriving an approximate posterior density targeting the true (unavailable) posterior by running massive simulations from the model to replace the intractable likelihood. It was first introduced in the context of population genetics; see, e.g., Beaumont et al. (2002). Since then, it has been successfully applied in a wide range of fields; see, e.g., Barnes et al. (2012); Blum (2010a); Boys et al. (2008); McKinley et al. (2017); Moores et al. (2015); Toni et al. (2009). Moreover, ABC has also been proposed to infer parameters from time series models (see, e.g., Drovandi et al. 2016; Jasra 2015), state space models (see, e.g., Martin et al. 2019; Tancredi 2019) and SDE models (see, e.g., Kypraios et al. 2017; Maybank et al. 2017; Picchini 2014; Picchini and Forman 2016; Picchini and Samson 2018; Sun et al. 2015; Zhu et al. 2016). Several advanced ABC algorithms have been proposed in the literature, such as, ABC-SMC, ABC-MCMC, sequential-annealing ABC, noisy ABC; see, e.g., Fan and Sisson (2018) and the references therein for a recent review. The idea of the basic acceptance-rejection algorithm is to keep a sampled parameter value from the prior as a realisation from the approximate posterior, if the distance between the summary statistics of the synthetic dataset, which is generated conditioned on this parameter value, and the summaries of the original reference data is smaller than some tolerance level. The goal of this paper is to illustrate how building up the ABC method on the structural properties of the underlying SDE and using a numerical method capable to preserve them in the generation of the data from the model leads to a successful inference even when applying ABC in this basic acceptance-rejection form.

The performance of any ABC method depends heavily on the choice of “informative enough” summary statistics, a suitable distance measure and a proper tolerance level $\epsilon$ . The quality of the approximation improves as $\epsilon$ decreases, and it has been shown that, under some conditions, the approximated ABC posterior converges to the true one when $\epsilon\to 0$ (Jasra 2015). At the same time though, the computational cost increases when $\epsilon$ decreases. A possibility is to use ad-hoc threshold selection procedures; see, e.g., Barber et al. (2015); Blum (2010b); Lintusaari et al. (2017); Prangle et al. (2014); Robert (2016). Here, we fix the tolerance level $\epsilon$ as a percentile of the calculated distances. This is another common practice used, for example, in Beaumont et al. (2002); Biau et al. (2015); Sun et al. (2015); Vo et al. (2015). Instructions for constructing effective summaries and distances are rare and they depend on the problem under consideration; see, e.g., Fearnhead and Prangle (2012) for a semi-automatic linear regression approach, Jiang et al. (2017) for an automatic construction approach based on training deep neural networks and Blum (2010b); Prangle (2018) for two recent reviews. To avoid the information loss caused by using non-sufficient summary statistics another common procedure is to work with the entire dataset; see, e.g., Jasra (2015); Sun et al. (2015). This requires the application of more sophisticated distances $d$ such as the Wasserstein metric (Bernton et al. 2019; Muskulus and Verduyn-Lunel 2011) or other distances designed for time series; for an overview see, e.g., Mori et al. (2016) and the references therein.

When working with stochastic models, simulations from the stochastic simulator, conditionally to the same parameter configuration, yield different trajectories. To consider summary statistics that are less sensitive to the intrinsic stochasticity of the model (Wood 2010), we choose them based on the structural property of an underlying invariant measure. The idea is to map the data, i.e., the realisations of the output process, to an object that is invariant for repeated simulations under the same parameter setting and that reacts sensitive to small changes in the parameters. In particular, we map the data to their estimated invariant density and invariant spectral density, taking thus the dependence structure of the dynamical model into account. The distance measure can then be chosen according to the mapped data.

As other simulation-based statistical methods, e.g., MCMC, SMC or machine learning algorithms, ABC relies on the ability of simulating data from the model. However, the exact simulation from complex stochastic models is rarely possible, and thus numerical methods need to be applied. This introduces a new level of approximation into the ABC framework. When the statistical method is build upon the structural properties of the underlying model, the successful inference can only be guaranteed when these properties are preserved in the synthetic data generated from the model. However, the issue of deriving a property-preserving numerical method when applying ABC to SDEs is usually seen as not so relevant, and it is usually recommended to use the Euler-Maruyama scheme or one of the higher order approximation methods described in Kloeden and Platen (1992); see, e.g., Picchini (2014); Picchini and Forman (2016); Picchini and Samson (2018); Sun et al. (2015). In general, these standard methods do not preserve the underlying structural properties of the model; see, e.g., Ableidinger et al. (2017); Malham and Wiese (2013); Moro and Schurz (2007); Strømmen Melbø and Higham (2004).

Here, we propose to apply structure-preserving numerical splitting schemes within the ABC algorithm. The idea of these methods is to split the SDE into explicitly solvable subequations and to apply a proper composition of the resulting exact solutions. Standard procedures are, for example, the Lie-Trotter method and the usually more accurate Strang approach; see, e.g., Leimkuhler et al. (2016). Since the only approximation enters through the composition of the derived explicit solutions, numerical splitting schemes usually preserve the structural properties of the underlying SDE and accurately reproduce its qualitative behaviour. Moreover, they usually have the same order of convergence as the frequently applied Euler-Maruyama method and are likewise efficient. We refer to Blanes et al. (2009) and Mclachlan and Quispel (2002) for an exhaustive discussion of splitting methods for broad classes of ordinary differential equations (ODEs), which partially have already been carried over to SDEs; see, e.g., Misawa (2001) for a general class of SDEs, Ableidinger and Buckwar (2016) for the stochastic Landau-Lifshitz equations, Bréhier and Goudenège (2019) for the Allen-Cahn equation and Ableidinger et al. (2017) for Hamiltonian type SDEs.

The main contribution of this work lies in the combination of the proposed invariant measure-based summary statistics and the measure-preserving numerical splitting schemes within the ABC framework. We demonstrate that a simulation-based inference method, here ABC, can only perform well if the underlying simulation method preserves the structural properties of the SDE. While the use of preserving splitting schemes within the ABC method yield successful results, applying a general purpose numerical method, such as the Euler-Maruyama discretisation, may result in seriously wrong inferences. We illustrate the proposed Spectral Density-Based and Measure-Preserving ABC method on the class of stochastic Hamiltonian type equations for which the existence of an underlying unique invariant distribution and measure-preserving numerical splitting schemes have been already intensively studied in the literature; see, e.g., Ableidinger et al. (2017); Mattingly et al. (2002); Leimkuhler and Matthews (2015); Milstein and Tretyakov (2004). Hamiltonian type SDEs have been investigated in molecular dynamics, where they are typically referred to as Langevin equations; see, e.g., Leimkuhler and Matthews (2015). Recently, they have also received considerable attention in the field of neuroscience as the so-called neural mass models (Ableidinger et al. 2017).

The paper is organised as follows. In Section 2, we recall the acceptance-rejection ABC setting. We introduce the invariant measure-based summary statistics and propose a proper distance. We then discuss the importance of considering measure-preserving numerical schemes for the synthetic data generation when exact simulation methods are not applicable and provide a short introduction to numerical splitting methods. In Section 3, we introduce Hamiltonian type SDEs and recall two splitting integrators preserving the invariant measure of the model. In Section 4, we validate the proposed method by investigating the stochastic harmonic oscillator, for which exact simulation is possible. In Section 5, we apply the proposed ABC method to the stochastic Jansen and Rit neural mass model (JR-NMM). We refer to Jansen and Rit (1995) for the original version, an ODE with a stochastic input function, and to Ableidinger et al. (2017) for its reformulation as a Hamiltonian type SDE. This model has been reported to successfully reproduce EEG data. We illustrate the performance of the proposed ABC method with both simulated and real data. Final remarks, possible extensions and conclusions are reported in Section 6. Further illustrations of the proposed ABC method are available in the provided supplementary material, here reported as Section 7. A sample code used to generate the main results is available on github.

2 Spectral Density-Based and Measure-Preserving ABC for partially observed SDEs with an invariant distribution

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a complete probability space with the right-continuous and complete filtration $\mathbb{F}=\{\mathcal{F}\}_{t\in[0,T]}$ . Let $\theta=(\theta_{1},...,\theta_{k})$ , $k\in\mathbb{N}$ , be a vector of relevant model parameters. We consider the following $n$ -dimensional, $n\in\mathbb{N}$ , non-autonomous SDE of Itô-type describing the time evolution of a system of interest

[TABLE]

The initial value $X_{0}$ is either deterministic or a $\mathbb{R}^{n}$ -valued random variable, measurable with respect to $\mathbb{F}$ . Here, $\textbf{W}=(W(t))_{t\in[0,T]}$ is a $r$ -dimensional, $r\in\mathbb{N}$ , Wiener process with independent and $\mathbb{F}$ -adapted components. We further assume that the drift component $f:~{}[0,T]\times\mathbb{R}^{n}\to\mathbb{R}^{n}$ and the diffusion component $\mathcal{G}:[0,T]\times\mathbb{R}^{n}\to\mathbb{R}^{n\times r}$ fulfil the necessary global Lipschitz and linear growth conditions, such that the existence and the pathwise uniqueness of an $\mathbb{F}$ -adapted strong solution process $\textbf{X}=(X(t))_{t\in[0,T]}\in\mathbb{R}^{n}$ of (1) is guaranteed; see, e.g., Arnold (1974).

We aim to infer the parameter vector $\theta$ inherent in the SDE (1), when the $n$ -dimensional solution process X is only partially observed through the 1-dimensional and parameter-dependent output process

[TABLE]

where $g:\mathbb{R}^{n}\to\mathbb{R}$ is a real-valued continuous function of the components of X.

Further, we assume a specific underlying structural model property, namely the existence of a unique invariant measure $\eta_{\textbf{Y}_{\theta}}$ on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ of the output process $\textbf{Y}_{\theta}$ , where $\mathcal{B}$ denotes the Borel Sigma-algebra. The process has invariant density $f_{\textbf{Y}_{\theta}}$ and mean, autocovariance and variance given by

[TABLE]

If the solution process X of SDE (1) admits an invariant distribution $\eta_{\textbf{X}}$ on $(\mathbb{R}^{n},\mathcal{B}(\mathbb{R}^{n}))$ , then the output process $\textbf{Y}_{\theta}$ inherits this structural property by means of the marginal invariant distributions of $\eta_{\textbf{X}}$ . Furthermore, if $X(0)\sim\eta_{\textbf{X}}$ , then the process $\textbf{Y}_{\theta}=(Y_{\theta}(t))_{t\in[0,\infty)}$ evolves according to the distribution $\eta_{\textbf{Y}_{\theta}}$ for all $t\geq 0$ . Our goal is to perform statistical inference for the parameter vector $\theta$ of the SDE (1), when the solution process X is partially observed through discrete time measurements of the output process $\textbf{Y}_{\theta}$ given in (2), by benefiting from the (in general unknown) invariant distribution $\eta_{\textbf{Y}_{\theta}}$ satisfying (3).

2.1 The ABC method

Let $y=(y(t_{i}))_{i=1}^{l}$ , $l\in\mathbb{N}$ , be the reference data, corresponding to discrete time observations of the output process $\textbf{Y}_{\theta}$ . Let us denote by $\pi(\theta)$ and $\pi(\theta|y)$ the prior and the posterior density, respectively. For multivariate complex SDEs, the underlying likelihood is often unknown or intractable. The idea of the ABC method is to derive an approximate posterior density for $\theta$ by replacing the unknown likelihood by possibly billions of synthetic dataset simulations generated from the underlying model (1) and mapped to ${\bf Y}_{\theta}$ through (2). The basic acceptance-rejection ABC algorithm consists of three steps: i. Sample a value $\theta^{\prime}$ from the prior $\pi(\theta)$ ; ii. Conditionally on $\theta^{\prime}$ , simulate a new artificial dataset from the model (1) and derive the synthetic data $y_{\theta^{\prime}}=(y_{\theta^{\prime}}(t_{i}))_{i=0}^{m},t_{0}=0,t_{m}=T,m\in\mathbb{N}$ , from the process $\bf Y_{\theta^{\prime}}$ given by (2); iii. Keep the sampled parameter value $\theta^{\prime}$ as a realisation from the posterior if the distance $d$ between a vector of summary statistics $s=(s_{1},\ldots,s_{h}),h\in\mathbb{N}$ , of the original and the synthetic data is smaller than some threshold level $\epsilon\geq 0$ , i.e., $d(s(y),s(y_{\theta^{\prime}}))<\epsilon$ .

When $\epsilon=0$ and $s$ is a vector of sufficient statistics for $\theta$ , the acceptance-rejection ABC (summarised in Algorithm 1) produces samples from the true posterior $\pi(\theta|y)$ . Due to the complexity of the underlying SDE (1), we cannot derive non-trivial sufficient statistics $s$ for $\theta$ . Moreover, due to the underlying stochasticity of the model, $\mathbb{P}(d(s(y),s(y_{\theta^{\prime}}))=0)=0$ . Thus, $\epsilon$ is required to be strictly positive. Hence, the acceptance-rejection ABC Algorithm 1 yields samples from an approximated posterior $\pi_{\textrm{ABC}}(\theta|y)$ according to

[TABLE]

Besides the tolerance level $\epsilon$ , which we fix as a percentile of the calculated distances, the quality of the ABC method depends strongly on the choice of suitable summary statistics combined with a proper distance measure and on the numerical method used to generate the synthetic data from the model. In the following, we introduce summaries that are very effective for the class of models having an underlying invariant distribution, we suggest a proper distance based on them and we propose the use of measure-preserving numerical splitting schemes.

2.2 An effective choice of summaries and distances: Spectral Density-Based ABC

When applying ABC to stochastic models, an important statistical challenge arises. Due to the intrinsic randomness, repeated simulations of the process $\textbf{Y}_{\theta}$ under the same parameter vector $\theta$ may yield very different trajectories. An illustration is given in Figure 1 (top and middle panels), where we report two trajectories of the output process of the stochastic JR-NMM (25) generated with an identical parameter configuration. This model is a specific SDE of type (1), observed through $\textbf{Y}_{\theta}$ as in (2), and admitting an invariant distribution $\eta_{\textbf{Y}_{\theta}}$ satisfying (3). See Section 5 for a description of the model. In the top panel, we visualise the full paths for a time $T=200$ , while in the middle panel we provide a zoom, showing only the initial part.

Proposal 1: To use the property of an invariant measure $\eta_{\bf Y_{\theta}}$ and to map the data $y_{\theta}$ to their estimated invariant density $\hat{f}_{y_{\theta}}$ and invariant spectral density $\hat{S}_{y_{\theta}}$ .

Instead of working with the output process $\textbf{Y}_{\theta}$ , we take advantage of the structural model property $\eta_{\textbf{Y}_{\theta}}$ and focus on its invariant density $f_{\textbf{Y}_{\theta}}$ and its invariant spectral density $S_{\textbf{Y}_{\theta}}$ . Both are deterministic functions characterized by the underlying parameters $\theta$ , and thus invariant for repeated simulations under the same parameter configuration. The invariant spectral density is obtained from the Fourier transformation of the autocovariance function $r_{\theta}$ , and it is given by

[TABLE]

for $\omega\in[-\pi,\pi]$ . The angular frequency $\omega$ relates to the ordinary frequency $\nu$ via $\omega=2\pi\nu$ . Since both $f_{\textbf{Y}_{\theta}}$ and $S_{\textbf{Y}_{\theta}}$ are typically unknown, we estimate them from a dataset $y_{\theta}$ . First, we estimate the invariant density $f_{\textbf{Y}_{\theta}}$ with a kernel density estimator, denoted by $\hat{f}_{y_{\theta}}$ ; see, e.g., Pons (2011). Second, we estimate the invariant spectral density $S_{\textbf{Y}_{\theta}}$ (4) with a smoothed periodogram estimator (Cadonna et al. 2017; Quinn et al. 2014), denoted by $\hat{S}_{y_{\theta}}$ , which is typically evaluated at Fourier frequencies. Differently from the invariant density, the invariant spectral density does not account for the mean $\mathbb{E}[\textbf{Y}_{\theta}]$ but captures the dependence structure of the data coming from the model. We define the invariant measure-based summary statistics $s$ of a dataset $y_{\theta}$ as

[TABLE]

Figure 1 shows the two estimated invariant densities (left lower panel) and invariant spectral densities (right lower panel), all derived from the full paths of the output process $\textbf{Y}_{\theta}$ (top panel).

After performing the data mapping (5), which significantly reduces the randomness in the output of the stochastic simulator, the distance $d$ can be chosen among the distance measures between two $\mathbb{R}$ -valued functions. Here, we consider the integrated absolute error (IAE) defined by

[TABLE]

Another natural possibility could be a distance chosen among the so-called f-divergences (see, e.g., Sason and Verdú 2016), or the Wasserstein distance, recently proposed for ABC (Bernton et al. 2019). Within the ABC framework (see Step $7$ in Algorithm 1), we suggest to use the following distance

[TABLE]

returning a weighted sum of the areas between the densities estimated from the original and the synthetic datasets. Here, $w\geq 0$ is a weight that we assign to the part related to the IAE of the invariant densities such that the two errors are of the same “order of magnitude”. This is particularly needed because, differently from the invariant density, the invariant spectral density does not integrate to 1. We obtain a value for the weight by performing an ABC pilot simulation. It consists in reiterating the following steps $L$ times:

1:Draw $\theta^{\prime}$ from the prior $\pi(\theta)$

2:Conditionally on $\theta^{\prime}$ , simulate two artificial datasets

$y_{\theta^{\prime}}^{1}$ and $y_{\theta^{\prime}}^{2}$ from the output process $\textbf{Y}_{\theta}$

3:Compute the corresponding summaries (5), i.e., $s(y_{\theta^{\prime}}^{1})=(\hat{S}_{y_{\theta^{\prime}}^{1}},\hat{f}_{y_{\theta^{\prime}}^{1}})$ and $s(y_{\theta^{\prime}}^{2})=(\hat{S}_{y_{\theta^{\prime}}^{2}},\hat{f}_{y_{\theta^{\prime}}^{2}})$

4:Determine a value for the weight using (7), i.e., $w^{\prime}=\frac{\text{IAE}(\hat{S}_{y_{\theta^{\prime}}^{1}},\hat{S}_{y_{\theta^{\prime}}^{2}})}{\text{IAE}(\hat{f}_{y_{\theta^{\prime}}^{1}},\hat{f}_{y_{\theta^{\prime}}^{2}})}$

Then, we take the median of the resulting $L$ values $w^{\prime}$ . See, e.g., Prangle (2017) for alternative approaches for the derivation of weights among summary statistics. Since the densities $\hat{f}_{{y_{\theta}}}$ and $\hat{S}_{y_{\theta}}$ are estimated at discrete points, the IAE (6) is approximated applying trapezoidal integration.

In Algorithm 1, we assume to observe $M\in\mathbb{N}$ datasets referring to $M$ realisations of the output process $\textbf{Y}_{\theta}$ sampled at $l\in\mathbb{N}$ discrete points in time, resulting in a matrix $y\in\mathbb{R}^{M\times l}$ of observed data. The median of the distances (7) computed for each of the $M$ datasets

[TABLE]

is then returned as a global distance in Step 7. Other strategies can be adopted. For example, considering the mean instead yields similar results in all our experiments. One can interpret $y$ as a long-time trajectory (when using simulated observed reference data) or as a long-time recording of the modelled phenomenon (when using real observed reference data) that is cut into $M$ pieces. Alternatively, $y$ would consist of M independent repeated experiments or simulations, when dealing with real or simulated data, respectively. As expected, having $M>1$ datasets improves the quality of the estimation due to the increased number of observations.

2.3 A new proposal of synthetic data generation: Measure-Preserving ABC

A crucial aspect of ABC and of all other simulation-based methods is the ability of simulating from the model (Step $5$ of Algorithm 1). Consider a discretized time grid with the equidistant time step $\Delta=t_{i+1}-t_{i}$ and let $\tilde{y}_{\theta}=(\tilde{y}_{\theta}(t_{i}))_{i=1}^{m}$ be a realisation from the process ${\bf\widetilde{Y}_{\theta}}=(\widetilde{Y}_{\theta}(t_{i}))_{i=1}^{m}$ , obtained through a numerical method, approximating $\textbf{Y}_{\theta}$ at the discrete data points, i.e., $\widetilde{Y}_{\theta}(t_{i})\approx Y_{\theta}(t_{i})$ . The lack of exact simulation schemes, i.e., $\widetilde{Y}_{\theta}(t_{i})=Y_{\theta}(t_{i})$ , introduces a new level of approximation in the statistical inference. In particular, Algorithm 1 samples from an approximated posterior density of the form

[TABLE]

As a consequence, $y_{\theta}$ in Step $5$ of Algorithm 1 is replaced by its numerical approximation $\tilde{y}_{\theta}$ .

The commonly used Euler-Maruyama scheme yields discretised trajectories of the solution process X of the SDE (1) through (Kloeden and Platen 1992)

[TABLE]

where $\xi_{i}$ are Gaussian vectors with null mean and variance $\Delta\mathbb{I}_{n}$ , where $\mathbb{I}_{n}$ denotes the $n\times n$ -dimensional identity matrix. As previously discussed, in general, the Euler-Maruyama method does not preserve the underlying invariant distribution $\eta_{\bf Y_{\theta}}$ .

Proposal 2: To adopt a numerical method for the synthetic data generation that preserves the underlying invariant measure of the model.

We apply numerical splitting schemes within the ABC framework and provide a brief account of their theory. Let us assume that the drift $f$ and the diffusion $\mathcal{G}$ of SDE (1) can be written as

[TABLE]

The goal is to decompose $f$ and $\mathcal{G}$ in a way such that the resulting $d$ subequations

[TABLE]

for $j\in\{1,\dots,d\}$ , can be solved exactly. Note that, the terms $\mathcal{G}^{[j]}$ can be null, resulting in deterministic equations (ODEs). Let $X^{[j]}(t)=\varphi_{t}^{[j]}(X_{0})$ denote the exact solutions (flows) of the above subequations at time $t$ and starting from $X_{0}$ . Once these explicit solutions are derived, a proper composition needs to be applied. Here we use the Strang approach

[TABLE]

that provides a numerical solution for the original SDE (1).

In Figure 2, we illustrate how the numerical splitting method preserves the underlying invariant measure of the weakly damped stochastic harmonic oscillator (23), independently from the choice of the time step $\Delta$ . This is a specific SDE of type (1), observed through $\textbf{Y}_{\theta}$ as in (2) and with a known invariant distribution $\eta_{\textbf{Y}_{\theta}}$ . See Section 3 for the detailed numerical splitting scheme and Section 4 for a description of the model. In contrast, the Euler-Maruyama scheme performs worse as $\Delta$ increases. Each subplot shows a comparison of the true invariant density (blue solid lines) and the corresponding kernel estimate $\hat{f}_{y_{\theta}}$ based on a path $y_{\theta}$ from the model, generated from the measure-preserving numerical splitting scheme (22) (dashed orange lines) or the Euler-Maruyama approach (dotted green lines). The data are generated under $T=10^{3}$ and different values for the time step, namely $\Delta=10^{-3}$ , $3\cdot 10^{-3}$ , $4.5\cdot 10^{-3}$ .

2.4 Notation

We apply the summary statistics (5) and the distance (8) in Algorithm 1. We use the notation Algorithm 1 (i) for the Spectral Density-Based ABC method when the synthetic data are simulated exactly, Algorithm 1 (ii) for the Spectral Density-Based and Measure-Preserving ABC method when a measure-preserving numerical splitting scheme is applied and Algorithm (1) (iii) when we generate the data with the non-preserving Euler-Maruyama scheme.

To evaluate the performance of the proposed ABC method, we analyse the marginal posterior densities, denoted by $\pi_{\textrm{ABC}}^{*}(\theta_{j}|y)$ , $j\in\{1,...,k\}$ , obtained from the posterior density $\pi_{\textrm{ABC}}^{*}(\theta|y)$ corresponding to $\pi_{\textrm{ABC}}(\theta|y)$ , $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta|y)$ or $\pi_{\textrm{ABC}}^{e}(\theta|y)$ , depending on whether we obtain it from Algorithm 1 (i), (ii) or (iii). Following this notation, we define by $\hat{\theta}_{\textrm{ABC},j}^{*}$ the marginal ABC posterior means.

3 An illustration on Hamiltonian type SDEs

We illustrate the proposed ABC approach on Hamiltonian type SDEs and define the $n$ -dimensional ( $n=2d$ , $d\in\mathbb{N}$ ) stochastic process

[TABLE]

consisting of the two $d$ -dimensional components

[TABLE]

where ${}^{{}^{\prime}}$ denotes the transpose. The $n$ -dimensional SDE of Hamiltonian type with initial value $X_{0}=~{}(Q_{0},P_{0})^{{}^{\prime}}$ and $d$ -dimensional ( $r=d$ ) Wiener process W describes the time evolution of the process X by

[TABLE]

We denote with $\mathbb{O}_{d}$ the $d\times d$ -dimensional zero matrix and with $\nabla_{Q}$ and $\nabla_{P}$ the gradient with respect to $Q$ and $P$ , respectively. The SDE (10) consists of $4$ parts, each representing a specific type of behaviour. In this configuration, the first is the Hamiltonian part involving $H:\mathbb{R}^{d}\times\mathbb{R}^{d}\to\mathbb{R}_{0}^{+}$ given by

[TABLE]

where $\Lambda_{\theta}=\text{diag}[\lambda_{1},...,\lambda_{d}]\in\mathbb{R}^{d\times d}$ is a diagonal matrix. The second is the linear damping part, described by the matrix $\Gamma_{\theta}=\text{diag}[\gamma_{1},...,\gamma_{d}]\in\mathbb{R}^{d\times d}$ . The third is the non-linear displacement part, consisting of the non-linear and globally Lipschitz continuous function $G:\mathbb{R}^{d}\to\mathbb{R}^{d}$ . The fourth corresponds to the diffusion part, given by $\Sigma_{\theta}=\text{diag}[\sigma_{1},...,\sigma_{d}]\in\mathbb{R}^{d\times d}$ .

3.1 Structural model property

Under the requirement of non-degenerate matrices $\Lambda_{\theta}$ , $\Gamma_{\theta}$ and $\Sigma_{\theta}$ , i.e., strictly positive diagonal entries, Hamiltonian type SDEs as in (10) are often ergodic. As a consequence, the distribution of the solution process X (and thus of the output process ${\bf Y}_{\theta}$ ) converges exponentially fast towards a unique invariant measure $\eta_{\textbf{X}}$ on $(\mathbb{R}^{n},\mathcal{B}(\mathbb{R}^{n}))$ (and thus $\eta_{\bf Y_{\theta}}$ on $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ ; see, e.g., Ableidinger et al. (2017) and the references therein.

3.2 Measure-Preserving numerical splitting schemes

Two splitting approaches for SDE (10) are provided, see Ableidinger et al. (2017). Due to the non-linear term $G$ , the SDE (10) cannot be solved explicitly. With the purpose of excluding $G$ , the Hamiltonian type SDE (10) is split into the two subsystems

[TABLE]

where $0_{d}$ denotes the $d$ -dimensional zero vector. This results in the linear SDE with additive noise (11) and the non-linear ODE (12) that can be both explicitly solved. Indeed, since $\nabla_{P}H(Q(t),P(t))=P(t)$ and $\nabla_{Q}H(Q(t),P(t))=\Lambda_{\theta}^{2}Q(t)$ , Subsystem (11) can be rewritten as

[TABLE]

with $A=\begin{pmatrix}\mathbb{O}_{d}&\mathbb{I}_{d}\\ -\Lambda_{\theta}^{2}&-2\Gamma_{\theta}\end{pmatrix}$ and $B=\left(\begin{array}[]{c}\mathbb{O}_{d}\\ \Sigma_{\theta}\end{array}\right)$ . The exact path of System (13) is obtained through

[TABLE]

where $\xi_{i}$ are $d$ -dimensional Gaussian vectors with null mean and variance $C(\Delta)$ , where the matrix $C(t)$ follows the dynamics of the matrix-valued ODE

[TABLE]

see Arnold (1974). Moreover, since the non-linear term $G$ depends only on the component Q, the exact path of Subsystem (12) is obtained through

[TABLE]

We apply the Strang approach given by

[TABLE]

where $\varphi_{t}^{a}$ and $\varphi_{t}^{b}$ denote the exact solutions (14) and (16) of (11) and (12), respectively. Hence, given $X(t_{i})$ , we obtain the next value $X(t_{i+1})$ by applying the following three steps:

1: $X_{b}=X(t_{i})+\left(\begin{array}[]{c}0_{d}\\ \frac{\Delta}{2}G(Q(t_{i});\theta)\end{array}\right)$

2: $X_{a}=e^{A\Delta}\cdot X_{b}+\xi_{i}$

3: $X(t_{i+1})=X_{a}+\left(\begin{array}[]{c}0_{d}\\ \frac{\Delta}{2}G(Q_{a};\theta)\end{array}\right)$

The derivation of the two subsystems is not unique. For example, another possibility is to combine the stochastic term with the non-linear part, yielding the subsystems

[TABLE]

The exact path of (18) is given by

[TABLE]

while the exact path of (19) is obtained through

[TABLE]

where $\xi_{i}$ are $d$ -dimensional Gaussian vectors with null mean and variance $\Delta\mathbb{I}_{d}$ . The Strang approach is now given by

[TABLE]

where $\varphi_{t}^{c}$ and $\varphi_{t}^{d}$ denote the exact solutions (20) and (21) of (18) and (19), respectively. Thus, given $X(t_{i})$ , the next value $X(t_{i+1})$ is obtained via:

1: $X_{c}=e^{A\frac{\Delta}{2}}\cdot X(t_{i})$

2: $X_{d}=X_{c}+\left(\begin{array}[]{c}0_{d}\\ \Delta G(Q_{c};\theta)+\Sigma_{\theta}\cdot\xi_{i}\end{array}\right)$

3: $X(t_{i+1})=e^{A\frac{\Delta}{2}}\cdot X_{d}$

3.3 Implementation details

The ABC procedure is coded in the computing environment R (R Development Core Team 2011), using the package Rcpp (Eddelbuettel and François 2011), which offers a seamless integration of R and C++, drastically reducing the computational time of the algorithms. The code is then parallelised using the R-packages foreach and doParallel, taking advantage of the for-loop in the algorithm. All simulations are run on the HPC cluster RADON1, a high-performing multiple core cluster located at the Johannes Kepler University Linz. To obtain smoothed periodogram estimates, we apply the R-function spectrum. It requires the specification of a smoothing parameter span. In all our experiments, we use span $=5T$ . In addition, we avoid using a logarithmic scale by setting the log parameter to “no”. To obtain kernel estimates of the invariant density, we apply the R-function density. Here, we use the default value for the smoothing bandwidth bw and set the number of points at which the invariant density has to be estimated to n $=10^{3}$ . The invariant spectral density is estimated at the default values of the spectrum function. A sample code is publicly available on github at https://github.com/massimilianotamborrino/sdbmpABC.

4 Validation of the proposed ABC method when exact simulation is possible

In this section, we illustrate the performance of the proposed ABC approach on a model problem (weakly damped stochastic harmonic oscillator) of Hamiltonian type (10) with vanishing non-linear displacement term $G\equiv 0$ . Linear SDEs of this type reduce to (13) and allow for an exact simulation of sample paths through (14). Therefore, we can apply the Spectral Density-Based ABC Algorithm 1 (i) under the optimal condition of exact, and thus $\eta_{\textbf{Y}_{\theta}}$ -preserving data generation. Its performance is illustrated in Subsection 4.2. To investigate how the numerical error in the synthetic data generation impinges on the ABC performance, in Subsection 4.3 we compare $\pi_{\textrm{ABC}}(\theta|y)$ with the posterior densities $\pi_{\textrm{ABC}}^{\text{num}}(\theta|y)$ and $\pi_{\textrm{ABC}}^{\text{e}}(\theta|y)$ obtained from Algorithm 1 (ii) and (iii) using the measure-preserving numerical splitting scheme (22) and the non-preserving Euler-Maryuama method (9), respectively.

4.1 Weakly damped stochastic harmonic oscillator: The model and its properties

We investigate the $2$ -dimensional Hamiltonian type SDE

[TABLE]

with strictly positive parameters $\gamma$ , $\lambda$ and $\sigma$ . Depending on the choice of $\gamma$ and $\lambda$ , (23) models different types of harmonic oscillators, which are common in nature and of great interest in classical mechanics. Here, we focus on the weakly damped harmonic oscillator, satisfying the condition $\lambda^{2}-\gamma^{2}>0$ . Our goal is to estimate $\theta=(\lambda,\gamma,\sigma)$ assuming that the solution process $\textbf{X}=(\textbf{Q},\textbf{P})^{{}^{\prime}}$ is partially observed through the first coordinate, i.e., $\textbf{Y}_{\theta}=\textbf{Q}$ . An illustration of the performance of Algorithm 1 (i) for the critically damped case satisfying $\lambda^{2}-\gamma^{2}=0$ , when only the second coordinate is observed, is reported in the supplementary material. The solution process X of SDE (23) is normally distributed according to

[TABLE]

with $A$ and $C$ introduced in (13) and (15), respectively. The invariant distribution $\eta_{\textbf{X}}$ of the solution process X is given by

[TABLE]

Consequently, the structural property $\eta_{\textbf{Y}_{\theta}}$ of the output process $\textbf{Y}_{\theta}$ becomes

[TABLE]

and the stationary dependency is captured by the autocovariance function

[TABLE]

where $\kappa=\sqrt{\lambda^{2}-\gamma^{2}}$ .

4.2 Validation of the Spectral Density-Based ABC Algorithm 1 (i)

To compare the performances of Algorithm 1 (i)-(iii) on the same data, we consider the same $M=10$ observed paths simulated with the exact scheme (14), using a time step $\Delta=10^{-2}$ over a time interval of length $T=10^{3}$ . As true parameters for the simulation of the reference data, we choose

[TABLE]

We use the exact simulation scheme (14) to generate $N=2\cdot 10^{6}$ synthetic datasets in $[0,T]$ and with the same time step as the observed data. We choose independent uniform priors, in particular,

[TABLE]

The tolerance level $\epsilon$ is chosen as the $0.05^{\text{th}}$ percentile of the calculated distances. Hence, we keep $10^{3}$ of all the sampled values for $\theta$ . In all the considered examples (see also the supplementary material), the performance of the ABC algorithms for the estimation of the parameters of SDE (23) does not improve when incorporating the information of the invariant densities into the distance (7). This is because the mean of the invariant distribution (24) is zero. Hence, to reduce the computational cost, we set $w=0$ and base our distance only on the invariant spectral density, estimated by the periodogram.

Figure 3 (top panels) shows the marginal ABC posterior densities $\pi_{\textrm{ABC}}(\theta_{j}|y)$ (blue lines) and their flat uniform priors $\pi(\theta_{j})$ (red lines). The proposed ABC Algorithm 1 (i) provides marginal posterior densities centred around the true values $\theta^{t}$ , represented by the black vertical lines. The posterior means are given by

[TABLE]

In the lower panels of Figure 3, we report the pairwise scatterplots of the kept ABC posterior samples. Note that, since the kept values of $\lambda$ are uncorrelated with those of the other parameters, the support of the obtained marginal posterior density is approximately the same as when estimating only $\theta=\lambda$ or $\theta=(\lambda,\gamma)$ (cf. supplementary material). Vice versa, since the kept ABC posterior samples of the parameters $\gamma$ and $\sigma$ are correlated, the support of $\pi_{\textrm{ABC}}(\gamma|y)$ is larger than that obtained when estimating $\theta=(\lambda,\gamma)$ . Despite this correlation, Algorithm 1 (i) allows for a successful inference of all the three parameters.

4.3 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii)

In Figure 4, we report the approximated marginal posteriors $\pi_{\textrm{ABC}}(\theta_{j}|y)$ (blue solid lines) and $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ (orange dashed lines) obtained with the same priors, $\epsilon$ , $T$ , $w$ , $M$ and $N$ as before, for different values of the time step $\Delta$ . In particular, we choose $\Delta=5\cdot 10^{-3}$ (top panels), $\Delta=7.5\cdot 10^{-3}$ (middle panels) and $\Delta=10^{-2}$ (lower panels). The posteriors obtained from Algorithm 1 (ii) successfully targets $\pi_{\textrm{ABC}}(\theta|y)$ , even for a time step as large as $\Delta=10^{-2}$ . On the contrary, Algorithm 1 (iii) is not even applicable. Indeed, the numerical scheme computationally pushes the amplitude of the oscillator towards infinity, resulting in a computer overflow, i.e., $\widetilde{Y}_{\theta}(t_{i})\approx\infty$ . Thus, neither $\hat{f}_{\tilde{y}_{\theta}}$ nor $\hat{S}_{\tilde{y}_{\theta}}$ can be computed and the density $\pi_{\textrm{ABC}}^{e}(\theta|y)$ cannot be derived.

As a further illustration of the poor performance of the Euler-Maruyama scheme, even for smaller choices of $\Delta$ , we now consider the simplest possible scenario where we only estimate one parameter, namely $\theta=\lambda$ . We set $N=10^{5}$ , $M=10$ , $\epsilon=1^{\text{st}}$ percentile and we choose a uniform prior $\lambda\sim U(10,30)$ . To be able to derive $\pi_{\textrm{ABC}}^{e}(\lambda|y)$ , we simulate the synthetic data using the Euler-Maruyama method with the time steps $\Delta=10^{-3}$ , $2.5\cdot 10^{-3}$ and $3.5\cdot 10^{-3}$ . Figure 5 shows the three ABC posterior densities $\pi_{\textrm{ABC}}(\theta|y)$ (blue solid lines), $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta|y)$ (orange dashed lines) and $\pi_{\textrm{ABC}}^{e}(\theta|y)$ (green dotted lines) for the different choices of $\Delta$ . The horizontal red lines and the black vertical lines denote the uniform prior and the true parameter value, respectively. In all cases, Algorithm 1 (iii) does not lead to a successful inference. In addition, these results are not stable for the different choices of $\Delta$ , and the derived ABC posterior density may not even cover the true parameter value.

5 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) on simulated and real data

We now illustrate the performance of Algorithm 1 (ii) by applying it to the stochastic JR-NMM. We rely on the efficient numerical splitting scheme (17) to guarantee measure-preserving synthetic data generation within the ABC framework. After estimating the parameters from simulated data, we infer them from real EEG data. In the available supplementary material, we illustrate the performance of Algorithm 1 (ii) also on the non-linear damped stochastic oscillator, an extended version of the weakly damped harmonic oscillator discussed in Section 4.

5.1 The stochastic Jansen and Rit neural mass model

The stochastic JR-NMM describes the electrical activity of an entire population of neurons through their average properties by modelling the interaction of the main pyramidal cells with the surrounding excitatory and inhibitory interneurons. The model has been reported to successfully reproduce EEG data, and is applied in the research of neurological disorders such as epilepsy or schizophrenia (Wendling et al. 2000, 2002). The model is a $6$ -dimensional SDE of the form

[TABLE]

where the $6$ -dimensional solution process is given by $\textbf{X}=(\textbf{Q},\textbf{P})^{{}^{\prime}}$ with the two components $\mathbf{Q}=(\mathbf{X_{1}},\mathbf{X_{2}},\mathbf{X_{3}})^{{}^{\prime}}$ and $\mathbf{P}=(\mathbf{X_{4}},\mathbf{X_{5}},\mathbf{X_{6}})^{{}^{\prime}}$ . None of the coordinates of X is directly observed. Only the difference between the second and third coordinates can be measured with EEG-recording techniques, yielding the output process

[TABLE]

In (25), the diagonal diffusion matrix is given by $\Sigma_{\theta}$ =diag $[\sigma_{4},\sigma_{5},\sigma_{6}]\in\mathbb{R}^{3\times 3}$ with coefficients $\sigma_{i}>0$ , $i=4,5,6$ . The matrix $\Gamma$ =diag $[a,a,b]\in\mathbb{R}^{3\times 3}$ is also diagonal with coefficients $a,b>0$ , representing the time constants of the excitatory and inhibitory postsynaptic potentials, respectively. The non-linear displacement term is given by

[TABLE]

where the sigmoid function Sigm: $\mathbb{R}\to[0,v_{max}]$ is defined as

[TABLE]

with $v_{max}>0$ referring to the maximum firing rate of the neural populations, $v_{0}\in\mathbb{R}$ describing the value for which $50\ \%$ of the maximum firing rate is attained and $r>0$ denoting the slope of the sigmoid function at $v_{0}$ . The parameters entering in $G$ are $\mu$ , $A$ , $B$ and $C_{i}$ , $i=1,2,3,4$ $\in\mathbb{R}^{+}$ . The coefficients $A$ and $B$ describe the average excitatory and inhibitory synaptic gain, respectively. The parameters $C_{i}$ are internal connectivity constants, which reduce to only one parameter $C$ , by using the relations $C_{1}=C$ , $C_{2}=0.8C$ , $C_{3}=0.25C$ and $C_{4}=0.25C$ ; see Jansen and Rit (1995).

5.2 Parameter inference from simulated data

Not all model parameters of the JR-NMM are of biological interest or can be simultaneously identified. For example, the noise coefficients $\sigma_{4}$ and $\sigma_{6}$ were introduced mainly for mathematical convenience in Ableidinger et al. (2017). To guarantee the existence of a unique invariant measure $\eta_{\textbf{X}}$ on $(\mathbb{R}^{6},\mathcal{B}(\mathbb{R}^{6}))$ , they are required to be strictly positive. However, from a modelling point of view, only the parameter $\sigma:=\sigma_{5}$ plays a role. Hence, we fix $\sigma_{4}=0.01$ and $\sigma_{6}=1$ . The coefficients $A$ , $B$ , $a$ , $b$ , $v_{0}$ , $v_{max}$ and $r$ have been experimentally recorded; see, e.g., Jansen et al. (1993); Jansen and Rit (1995); van Rotterdam et al. (1982). Thus, we fix them according to these values reported, for example, in Table $1$ of Ableidinger et al. (2017). In contrast, the connectivity parameter $C$ , which represents the average number of synapses between the neural subpopulations and controls to what extent the main population interacts with the interneurons, varies under different physiological constraints. Changing $C$ allows, for example, a transition from $\alpha$ -rhythmic activity to epileptic spiking behaviour; see, e.g., Ableidinger et al. (2017). Here, we focus on the $\alpha$ -rhythmic activity. Since the parameters $\sigma$ and $\mu$ are new in the SDE version (25), they have not yet been estimated. They can be interpreted as stochastic and deterministic external inputs coming from neighbouring or more distant cortical columns, respectively. Thus, together with the internal connectivity parameter $C$ , they are of specific interest. Before inferring $\theta=(\sigma,\mu,C)$ , we take into account the coefficients $A$ and $B$ to discuss a model-specific issue of identifiability.

5.2.1 Identifiability issues: The detection of an invariant manifold, i.e., a set of parameters yielding the same type of data

For the original JR-NMM, it has been shown that different combinations of the parameters $A$ , $B$ and $C$ yield the same type of output, namely the $\alpha$ -rhythmic brain activity. Applying the proposed Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of $\theta=(A,B,C)$ , with given $\mu=220$ and $\sigma=2000$ , we confirm that the same non identifiability arises for the SDE version (25). We choose $M=30$ observed paths generated assuming

[TABLE]

as suggested in the literature (Jansen and Rit 1995). The reference and synthetic data are generated over a time interval of length $T=200$ and using a time step $\Delta=2\cdot 10^{-3}$ . Within the algorithm, we generate $N=2.5\cdot 10^{6}$ synthetic datasets. We choose the weight $w$ in (7) according to the procedure introduced in Subsection 2.2 (based on $L=10^{5}$ iterations) and fix the tolerance level $\epsilon=0.04^{\text{th}}$ percentile. Further, we choose independent uniform prior distributions, namely

[TABLE]

Figure 6 (top panels) shows the marginal ABC posterior densities $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ and the uniform prior densities $\pi(\theta_{j})$ . Clearly, the parameters cannot be inferred simultaneously. The kept ABC posterior values of the parameters $A$ , $B$ and $C$ are strongly correlated, as observed in the pairwise scatterplots (middle panels) and in the $3$ -dimensional scatterplot (two different views, lower panels). The cuboid covers all possible values for $\theta$ drawn from the prior. After running the ABC algorithm, the kept values of $\theta$ from the ABC posterior form an invariant manifold, in the sense that all the parameter values $\theta$ lying on this manifold yield similar paths $\tilde{y}_{\theta}$ of the output process. This is shown in Figure 7, where we report four trajectories that have been simulated with the same random numbers but using the parameters $\theta^{t}$ (green dot in Figure 6) and three of the kept ABC posterior samples lying on the invariant manifold (red, orange and grey dots in Figure 6). A segment of $T=10$ is split in the top and middle panels. In addition, we visualise the corresponding estimated invariant densities (bottom left) and invariant spectral densities (bottom right). This explains why the parameters $A$ , $B$ and $C$ are not simultaneously identifiable from the observed data. Since the internal connectivity parameter $C$ has an important neuronal meaning, in the following we assume $A$ and $B$ to be known and infer $\theta=(\sigma,\mu,C)$ . The estimation of $\theta=(\sigma,\mu)$ when $C$ is known is reported in the supplementary material.

5.2.2 Inference of $\theta=(\sigma,\mu,C)$

Now, we keep the same ABC setting as before and choose independent uniform priors $\pi(\theta_{j})$ according to

[TABLE]

The reference data are simulated under

[TABLE]

In Figure 8, we report the marginal ABC posterior densities $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ (blue lines), the uniform prior densities $\pi(\theta_{j})$ (red lines) and the true parameter values $\theta^{t}$ (black vertical lines). We obtain unimodal posterior densities, centred around the true parameter values. The posterior density of $\sigma$ is slightly broader compared to that obtained when $C$ is known (cf. Figure 19 of the supplementary material). This results from a weak correlation that we detect among the kept ABC posterior samples of the parameters $\sigma$ and $C$ (figures not reported). The posterior means are equal to

[TABLE]

and are thus close to $\theta^{t}$ . These results suggest an excellent performance of the proposed Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii).

Similar satisfactory results are obtained even when adding a fourth parameter, for example, when inferring $\theta=(\sigma,\mu,C,b)$ (cf. Figure $20$ of the supplementary material). When applying Algorithm 1 (ii) to real EEG data (cf. Figure $21$ of the supplementary material), the marginal posterior for $b$ is centred around the value $b=50$ , which is that reported in the literature. Due to the existence of underlying invariant manifolds, identifiability issues, similar to those reported in Figure 6, arise when adding further or other coefficients, revealing model-specific issues for the stochastic JR-NMM.

To illustrate again the importance of the structure-preservation within the ABC method, we now apply Algorithm 1 (iii) combined with the Euler-Maruyama scheme (9). We use the same conditions as before, except for a smaller time step $\Delta=10^{-4}$ used for the generation of the observed reference data with the Euler-Maruyama method aiming for a realistic data structure. In Figure 9, we report the marginal ABC posterior densities $\pi_{\textrm{ABC}}^{e}(\theta_{j}|y)$ (top panels) and the uniform prior densities. In the $3$ -dimensional scatterplot of Figure 9 (lower panel), the green dots in the middle of the cuboid represent the kept ABC posterior samples when applying Algorithm 1 (ii) (see the previous results reported in Figure 8), which are nicely spread-out around the true parameter vector $\theta^{t}$ (black dot). The red dots correspond to the kept ABC posterior samples from $\pi_{\textrm{ABC}}^{e}(\theta|y)$ . Hence, Algorithm 1 (iii) based on the Euler-Maruyama scheme provides a posterior that is far off from the true parameter vector.

5.3 Parameter inference from real EEG data

Finally, we use the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) to estimate the parameter vector $\theta=(\sigma,\mu,C)$ of the stochastic JR-NMM from real EEG recordings. We use $M=3$ $\alpha$ -rhythmic recordings, rescaled to a realistic range. The EEG data were sampled according to a sampling rate of 173.61 Hz, i.e., a time step $\Delta$ of approximately $5.76\ \textrm{ms}$ over a time interval of length $T=23.6$ s. All measurements were carried out with a standardised electrode placement scheme; see Andrzejak et al. (2001) for further information on the data111The data are available on: http://ntsa.upf.edu/downloads/andrzejak-rg-et-al-2001-indications-nonlinear-deterministic-and-finite-dimensional. Figure 10 shows the first $20$ seconds of one of the observed EEG datasets. Here, we simulate $N=5\cdot 10^{6}$ synthetic paths from the output process of the stochastic JR-NMM (25) over the same time interval $T$ as the real data, with a time step $\Delta=2\cdot 10^{-3}$ and $\epsilon=0.02^{\text{nd}}$ percentile. We choose independent uniform priors $\pi(\theta_{j})$ according to

[TABLE]

Figure 11 shows the resulting marginal ABC posterior densities $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ and the uniform prior densities $\pi(\theta_{j})$ . All ABC marginal posteriors are unimodal, with means given by

[TABLE]

Since $\mu$ and $\sigma$ have not been estimated before, we cannot compare the obtained results with those available in the literature. The ABC posterior density for $C$ is centred around $C=135$ that is the reference literature value for $\alpha$ -rhythmic EEG data.

In Figure 12, we report the first $10$ seconds of a trajectory of the output process of the fitted stochastic JR-NMM (25), generated with the numerical splitting scheme (17), choosing $\Delta=2\cdot 10^{-3}$ and $T=23.6$ . Note how the path shows a similar oscillatory behaviour as in Figure 10. This is confirmed by noting the satisfactory matches between the invariant densities (bottom left) and the invariant spectral densities (bottom right) estimated from the EEG recording (red dashed lines) and from the fitted model (blue solid lines). The match is poor only for low frequencies of the invariant spectral density, even when choosing broader priors. This may result from a lack of fit of the JR-NMM or of stationarity in the considered EEG data. A deeper investigation of the model and of its ability in reproducing real EEG data is currently under investigation, but it is out of the scope of this work.

6 Conclusion

When performing parameter inference through ABC, crucial and non-trivial tasks are to propose suitable summary statistics and distances to compare the observed and the synthetic datasets. When the underlying models are stochastic, repeated simulations from the same parameter setting yield different outputs, making the comparison between the observed and the synthetic data more difficult. To derive summary statistics that are less sensitive to the intrinsic randomness of the stochastic model, we propose to map the data to their invariant density and invariant spectral density, estimated by a kernel density estimator and a smoothed periodogram, respectively. By doing this, different trajectories of the output process are mapped to the same objects only when they are generated from the same underlying parameters, provided that all parameters are simultaneously identifiable. These transformations are based on the existence of an underlying invariant measure for the model, fully characterised by the parameters. A necessary condition of ABC, and of all other simulation-based methods, is the ability to generate data from the model. This is often taken for granted but, in general, it is not the case. Indeed, exact simulation is rarely possible and property-preserving numerical methods have to be derived.

The combination of the measure-preserving numerical splitting schemes and the use of the spectral density-based distances in the ABC algorithm leads to a successful inference of the parameters, as illustrated on stochastic Hamiltonian type equations. We validated the proposed ABC approach on both linear model problems, allowing for an exact simulation of the synthetic data, and non-linear problems, including an application to real EEG data. Our choice of the crucial ingredients (summary statistics and distances based on the underlying invariant distribution and a measure-preserving numerical method) yields excellent results even when applied to ABC in its basic acceptance-rejection form. However, they can be directly applied to more advanced ABC algorithms. In contrast, the ABC method based on the Euler-Maruayma scheme drastically fails. Its performance may improve for “small enough” time steps. However, there is a trade-off between the runtime and the acceptance performance of Algorithm 1 (iii). Indeed, the simulation of one trajectory with a time step $10^{-4}$ requires approximately hundred times more than the generation of one trajectory using a time step $10^{-2}$ . Hence, a runtime of a few hours would turn to months. In addition, even for “arbitrary small” time steps, one cannot guarantee that the Euler-Maruyama scheme preserves the underlying invariant measure. For these reasons, it is crucial to base our ABC method on the reliable measure-preserving numerical splitting scheme combined with the invariant measure-based distances. Our results were discussed in the case of an observable 1-dimensional output process. However, the approach can be directly applied to $d$ -dimensional output processes, $d>1$ , as long as the underlying SDEs are characterised by an invariant distribution and a measure-preserving numerical method can be derived. In particular, one can compute the distances in (8) for each of the $d$ components and derive a global distance by combining them, e.g., via their sum. Moreover, to account for possible dependences between the observed components, one can incorporate the cross-spectral densities which are expected to provide further information resulting in an improvement of the performance of the method. An investigation in this direction is currently undergoing. Finally, our proposed ABC method may be also used to investigate invariant manifolds characterised by sets of parameters yielding the same type of data, as illustrated on the stochastic JR-NMM. This may result in a better understanding of the qualitative behaviour of the underlying model and its ability of reproducing the true features of the modelled phenomenon.

7 Supplementary Material

In this supplementary material, we extend the illustration of the performance of the proposed ABC approach by more examples. In particular, we consider two additional SDEs. First, the critically damped harmonic oscillator (23), fulfilling $\lambda^{2}-\gamma^{2}=0$ , for which an exact simulation of sample paths is possible, allowing for a validation of Algorithm 1 (i). Second, a non-linear weakly damped stochastic oscillator, for which we need to apply a measure-preserving numerical splitting scheme, and thus investigate Algorithm 1 (ii). Moreover, we also report the simultaneous inference of the new parameters $\theta=(\sigma,\mu)$ in the stochastic JR-NMM (25), when the connectivity parameter $C$ is known (while in the main manuscript, we estimate $\theta=(\sigma,\mu,C)$ ). Finally, we report the estimation of $\theta=(\sigma,\mu,C,b)$ , based on both simulated and real EEG data.

7.1 Validation of the Spectral Density-Based ABC Algorithm 1 (i)

We denote by Model Problem $1$ (MP1) the critically damped harmonic oscillator obtained from (23) with $\lambda^{2}-\gamma^{2}=0$ (introduced below), and with Model Problem $2$ (MP2) the weakly damped harmonic oscillator, satisfying $\lambda^{2}-\gamma^{2}>0$ (see Section 4 of the main manuscript). Figure 13 shows two realisations of the output process of MP1 generated with the same choice of parameters. Figure 14 shows two paths of the output process of MP2 simulated under the same parameter setting. We perform a step by step investigation of Algorithm 1 (i), starting with the estimation of one single model parameter and closing with the successful inference of all parameters.

7.1.1 Critically damped stochastic harmonic oscillator: The model and its properties

We recall the harmonic oscillator (23), focusing on the critically damped case, i.e., $\lambda^{2}-\gamma^{2}=0$ . We assume that the $2$ -dimensional process $\textbf{X}=(\textbf{Q},\textbf{P})^{{}^{\prime}}$ is partially observed through the second component, i.e., $\textbf{Y}_{\theta}=\textbf{P}$ . The invariant distribution $\eta_{\textbf{X}}$ of the process X is given by

[TABLE]

Consequently, the invariant distribution $\eta_{\textbf{Y}_{\theta}}$ of the output process $\textbf{Y}_{\theta}$ equals

[TABLE]

and the autocovariance function is given by

[TABLE]

7.1.2 Task 1: Inferring one parameter

At first, we estimate one specific parameter $\theta$ of the model problems, keeping the others fixed. For MP1, we set $\theta=\gamma$ and fix $\sigma=2$ . For MP2, we focus on $\theta=\lambda$ , fixing $\gamma=1$ and $\sigma=2$ . The ABC Algorithm 1 (i) is applied to both model problems with $M=10$ observed paths simulated with the exact scheme (14) using a time step $\Delta=10^{-2}$ over a time interval of length $T=10^{3}$ . In addition, we generate $N=10^{5}$ synthetic datasets over the same time domain with equal time steps using the exact simulation scheme (14). We set the tolerance level to $\epsilon=1^{\text{st}}$ percentile of the calculated distances. Furthermore, we choose uniform prior distributions $\pi(\theta)$ according to

[TABLE]

We use the same parameter setting as in Figure 13 and Figure 14 for the simulation of the observed reference datasets. In particular, the true parameter values are

[TABLE]

In Figure 15, we report the results obtained from the proposed Spectral Density-Based ABC Algorithm 1 (i). The left panel (referring to MP1) and the right panel (referring to MP2) show the ABC posterior densities $\pi_{\text{ABC}}(\theta|y)$ (blue lines). The horizontal red and vertical black lines denote the prior densities and the true parameter values, respectively. It is remarkable how the flat uniform prior densities are updated by means of the observed data resulting in narrow and unimodal posterior densities that are centered around the true parameter values. The ABC posterior means for this and the other scenarios (i.e. the inference of two and three parameters) are reported in Table 1.

7.1.3 Task 2: Inferring two parameters

We aim for the simultaneous estimation of two parameters, keeping the parameter $\sigma=2$ fixed in MP2. In particular, we consider $\theta=(\gamma,\sigma)$ for MP1 and $\theta=(\lambda,\gamma)$ for MP2. We apply Algorithm 1 (i) combined with the exact scheme (14) under the same values for $M$ , $\Delta$ and $T$ as before. Now, we generate $N=5\cdot 10^{5}$ synthetic datasets and fix $\epsilon=0.2^{\text{nd}}$ percentile of the calculated distances, keeping the same number of ABC posterior samples as before. We choose the independent uniform priors $\pi(\theta_{j})$ according to

[TABLE]

The true parameter values are

[TABLE]

The ABC marginal posterior densities $\pi_{\text{ABC}}(\theta_{j}|y)$ (blue lines) are reported in the left and middle panels of Figure 16 for MP1 (top panels) and MP2 (lower panels), while the right panels of Figure 16 show the scatterplots of the kept ABC posterior samples. Also in this case, the posteriors are unimodal and centered around the true parameter values. Note that, the support of $\pi_{\text{ABC}}(\lambda|y)$ for MP2 is approximately the same as in Figure 15, suggesting that, in the case of inferring two parameters, the proposed ABC method is able to identify the same region for $\lambda$ as in the case of estimating one parameter. The reason is that the kept ABC posterior samples of $\lambda$ and $\gamma$ are not correlated, as it can be observed in the right lower panel of Figure 16. On the contrary, the support of $\pi_{\text{ABC}}(\gamma|y)$ for MP1 is broader than in Figure 15, due to a correlation among the kept ABC posterior samples of $\gamma$ and $\sigma$ (cf. right top panel of Figure 16). In spite of this, the ABC marginal posterior density resembles that derived when estimating only one parameter (cf. left panel of Figure 15).

7.1.4 Task 3: Inferring three parameters

The last goal is the simultaneous inference of all the three parameters $\theta=(\lambda,\gamma,\sigma)$ of MP2.222Task 3 is already presented in Subsection 4.2 of the main manuscript. For completeness, we recall it here. In Figure 3 (top panels) we report the ABC marginal posterior densities (blue lines) and the prior densities (red lines). In the lower panels, we show the pairwise scatterplots of the kept ABC posterior samples. The kept posterior values of $\lambda$ turned out to be not correlated with those of the other two parameters, yielding approximately the same support as in Figure 15 and Figure 16. Similar to MP1, the kept ABC posterior samples of $\gamma$ and $\sigma$ are correlated (cf. right lower panel of Figure 3), leading to a support for $\gamma$ broader than that in Figure 16. The ABC marginal posterior densities shown in Figures 3, 15 and 16, and the results reported in Table 1 highlight the good performance of the proposed Spectral Density-Based ABC Algorithm 1 (i) under the optimal condition of exact, and thus measure-preservative data simulation from the underlying model.

7.2 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) on an extended version of MP2

We now consider an extended non-linear version of the previously studied Model Problem $2$ . Due to the non-linearity in the model, an exact simulation scheme is not available. Hence, we consider the measure-preserving numerical splitting scheme (17), and thus investigate the performance of Algorithm 1 (ii).

7.2.1 A non-linear weakly damped stochastic oscillator

We consider a stochastic oscillator that incorporates a high-amplitude sine wave represented by the non-linear displacement term $G(\textbf{Q})=-10^{3}\sin(\textbf{Q})$ . In particular, we study the $2$ -dimensional SDE

[TABLE]

with the strictly positive parameters $\theta=(\lambda,\gamma,\sigma)$ . The condition $\lambda^{2}-\gamma^{2}>0$ guarantees a weak damping. The $2$ -dimensional solution process $\textbf{X}=(\textbf{Q},\textbf{P})^{{}^{\prime}}$ is partially observed through the first coordinate, i.e., $\textbf{Y}_{\theta}=\textbf{Q}$ . Figure 17 shows two realisations of the output process generated with the same choice of parameters.

7.2.2 Parameter inference from simulated data

We assume to observe $M=30$ paths of the output process simulated with the measure-preserving numerical scheme (17) over a time interval of length $T=10^{3}$ using a time step $\Delta=10^{-2}$ and the same true parameter values as in Figure 17, i.e.,

[TABLE]

We then use the same $T$ and $\Delta$ to generate $N=2\cdot 10^{6}$ synthetic datasets within ABC. We further choose the tolerance level $\epsilon=0.05^{\text{th}}$ percentile of the calculated distances, set $w=0$ in (7) and use independent uniform prior distributions

[TABLE]

Figure 18 shows the ABC marginal posterior densities $\pi_{\text{ABC}}^{\textrm{num}}(\theta_{j}|y)$ . They are unimodal, narrow and centered around the true parameter values. The ABC posterior means are given by

[TABLE]

In spite of the presence of the non-linear term $G$ , the inference via Algorithm 1 (ii) yields results similar to those obtained for MP2 when applying Algorithm 1 (i) under the exact data generation.

7.3 Application of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of the new parameters $\theta=(\sigma,\mu)$ of the stochastic JR-NMM

We now estimate $\theta=(\sigma,\mu)$ of the stochastic JR-NMM (25) (see Section 5 of the main manuscript). These are new parameters introduced by Ableidinger et al. (2017) in the SDE reformulation of the original JR-NMM (Jansen and Rit 1995). Differently from the other parameters, these parameters have not yet been estimated in the literature. Here, we fix $C=135$ and apply Algorithm 1 (ii) with $M=30$ , $N=5\cdot 10^{5}$ , $\Delta=2\cdot 10^{-3}$ and $T=200$ . We fix $\epsilon=0.2^{\text{nd}}$ percentile of the calculated distances and choose uniform prior distributions according to

[TABLE]

The true parameter values used to generate the observed data are given by

[TABLE]

Figure 19 shows the ABC marginal posterior densities $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ (left and middle top panels) obtained by applying the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii). The posteriors are centered around the true parameter values, leading to marginal ABC posterior means given by

[TABLE]

From the scatterplot of the kept ABC posterior samples of $\sigma$ and $\mu$ (right top panel), we conclude that they are not correlated. The successful performance of the proposed ABC approach is also visible by looking at the contour plot of the ABC posterior density (lower panel). Indeed, the proposed algorithm is able to detect a plain region of posterior values for $\theta$ around $\theta^{t}$ .

7.4 Application of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of $\theta=(\sigma,\mu,C,b)$ of the stochastic JR-NMM

We now demonstrate that we obtain satisfactory results even when inferring the four parameters $\theta=(\sigma,\mu,C,b)$ of the stochastic JR-NMM (25). Since the parameters of main interest are $\sigma$ , $\mu$ and $C$ , in the main manuscript (see Section $5$ ) we did not take into account the well-reported coefficient $b$ , which takes the value $b=50$ in the literature; see, e.g., Jansen and Rit (1995) and the references therein.

7.4.1 Inference from simulated data

We start with inferring $\theta=(\sigma,\mu,C,b)$ from simulated data and apply Algorithm 1 (ii) for $M=30$ , $N=5\cdot 10^{6}$ , $\Delta=2\cdot 10^{-3}$ and $T=200$ . We fix $\epsilon=0.004^{\text{th}}$ percentile and use the following uniform priors

[TABLE]

The reference data is generated under

[TABLE]

In Figure 20, we report the marginal ABC posterior densities $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ , which are again centered around the true parameter values. The marginal posterior means are given by

[TABLE]

7.4.2 Inference from real EEG data

Finally, we infer $\theta=(\sigma,\mu,C,b)$ from real EEG data. Algorithm 1 (ii) is applied under the same conditions as in Subsection 5.3 of the main manuscript, except fixing $\epsilon=0.002^{\text{nd}}$ percentile of calculated distances and choosing the uniform priors according to

[TABLE]

Figure 21 shows the unimodal marginal ABC posterior densities $\pi_{\textrm{ABC}}^{\textrm{num}}(\theta_{j}|y)$ , yielding posterior means given by

[TABLE]

Focusing on the coefficient $b$ , the corresponding marginal posterior density is centered around $b=50$ , which is the value reported in the literature.

Bibliography63

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ableidinger and Buckwar (2016) Ableidinger, M., Buckwar, E.: Splitting Integrators for the Stochastic Landau–Lifshitz Equation. SIAM J. Sci. Comput. 38, A 1788–A 1806 (2016)
2Ableidinger et al. (2017) Ableidinger, M., Buckwar, E., Hinterleitner, H.: A Stochastic Version of the Jansen and Rit Neural Mass Model: Analysis and Numerics. J. Math. Neurosci. 7(8) (2017)
3Andrzejak et al. (2001) Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., Elger, C. E.: Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 64, 061907 (2001)
4Arnold (1974) Arnold, L.: Stochastic differential equations: theory and applications. Wiley, New York (1974)
5Barber et al. (2015) Barber, S., Voss, J., Webster, M.: The rate of convergence for approximate Bayesian computation. Electron. J. Stat. 9(1), 80–105 (2015)
6Barnes et al. (2012) Barnes, C., Filippi, S., Stumpf, M., Thorne, T.: Considerate approaches to constructing summary statistics for ABC model selection. Stat. Comput. 22(6), 1181–1197 (2012)
7Beaumont et al. (2002) Beaumont, M. A., Zhang, W., Balding, D. J.: Approximate Bayesian Computation in Population Genetics. Genetics 162(4), 2025–2035 (2002)
8Bernton et al. (2019) Bernton, E., Jacob, P. E., Gerber, M., Robert, C. P.: Approximate Bayesian computation with the Wasserstein distance. J. Roy. Stat. Soc. B (2019)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Spectral Density-Based and Measure-Preserving ABC

Abstract

Keywords

Acknowledgements

1 Introduction

2 Spectral Density-Based and Measure-Preserving ABC for partially observed SDEs with an invariant distribution

2.1 The ABC method

2.2 An effective choice of summaries and distances: Spectral Density-Based ABC

2.3 A new proposal of synthetic data generation: Measure-Preserving ABC

2.4 Notation

3 An illustration on Hamiltonian type SDEs

3.1 Structural model property

3.2 Measure-Preserving numerical splitting schemes

3.3 Implementation details

4 Validation of the proposed ABC method when exact simulation is possible

4.1 Weakly damped stochastic harmonic oscillator: The model and its properties

4.2 Validation of the Spectral Density-Based ABC Algorithm 1 (i)

4.3 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii)

5 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) on simulated and real data

5.1 The stochastic Jansen and Rit neural mass model

5.2 Parameter inference from simulated data

5.2.1 Identifiability issues: The detection of an invariant manifold, i.e., a set of parameters yielding the same type of data

5.2.2 Inference of θ=(σ,μ,C)\theta=(\sigma,\mu,C)θ=(σ,μ,C)

5.3 Parameter inference from real EEG data

6 Conclusion

7 Supplementary Material

7.1 Validation of the Spectral Density-Based ABC Algorithm 1 (i)

7.1.1 Critically damped stochastic harmonic oscillator: The model and its properties

7.1.2 Task 1: Inferring one parameter

7.1.3 Task 2: Inferring two parameters

7.1.4 Task 3: Inferring three parameters

7.2 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) on an extended version of MP2

7.2.1 A non-linear weakly damped stochastic oscillator

7.2.2 Parameter inference from simulated data

7.3 Application of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of the new parameters θ=(σ,μ)\theta=(\sigma,\mu)θ=(σ,μ) of the stochastic JR-NMM

7.4 Application of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of θ=(σ,μ,C,b)\theta=(\sigma,\mu,C,b)θ=(σ,μ,C,b) of the stochastic JR-NMM

7.4.1 Inference from simulated data

7.4.2 Inference from real EEG data

5.2.2 Inference of $\theta=(\sigma,\mu,C)$

7.3 Application of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of the new parameters $\theta=(\sigma,\mu)$ of the stochastic JR-NMM

7.4 Application of the Spectral Density-Based and Measure-Preserving ABC Algorithm 1 (ii) for the inference of $\theta=(\sigma,\mu,C,b)$ of the stochastic JR-NMM