Bayesian inference of ocean diffusivity from Lagrangian trajectory data

Y. K. Ying; J. R. Maddison; J. Vanneste

arXiv:1812.04264·physics.ao-ph·September 4, 2019

Bayesian inference of ocean diffusivity from Lagrangian trajectory data

Y. K. Ying, J. R. Maddison, J. Vanneste

PDF

TL;DR

This paper introduces a Bayesian method to infer spatially-variable ocean diffusivity from Lagrangian trajectory data, providing uncertainty quantification and validated through flow simulations.

Contribution

It develops a novel Bayesian inference scheme for eddy-diffusivity fields using Lagrangian data, incorporating uncertainty quantification and validation against theoretical and simulation data.

Findings

01

Accurately estimates anisotropic diffusivity fields from limited data.

02

Provides posterior distributions for model parameters, including uncertainty.

03

Successfully applied to both simple and complex ocean flow simulations.

Abstract

A Bayesian approach is developed for the inference of an eddy-diffusivity field from Lagrangian trajectory data. The motion of Lagrangian particles is modelled by a stochastic differential equation associated with the advection-diffusion equation. An inference scheme is constructed for the unknown parameters that appear in this equation, namely the mean velocity, velocity gradient, and diffusivity tensor. The scheme provides a posterior probability distribution for these parameters, which is sampled using the Metropolis-Hastings algorithm. The approach is applied first to a simple periodic flow, for which the results are compared with the prediction from homogenisation theory, and then to trajectories in a three-layer quasigeostrophic double-gyre simulation. The statistics of the inferred diffusivity tensor are examined for varying sampling interval and compared with a standard…

Tables2

Table 1. Table 1: Parameters used in the idealised Taylor–Green vortex configuration.

Parameter	Symbol	Value(s)
Spatial period	$L$	$100$ km
Maximum vortex speed	$U_{TG}$	$40$ cm s^-1
Background flow speed	$U_{M}$	$20$ cm s^-1
Background flow angle	$ϕ_{M}$	$30^{\circ}$
Small-scale diffusivity	$κ$	$50$ m² s^-1
Particle integration time step size	$Δ t$	$84.3750$ s
Total particle integration time	$T$	$256$ days
Number of particles	$N$	$256$
Data sampling interval	$s$	$3$ hours, $6$ hours, … $120$ days
Markov Chain Monte Carlo iterations	$N_{m h}$	$10^{5}$
Number of independent Markov Chains	$M$	$3$

Table 2. Table 2: Parameters used to initialise the Metropolis–Hastings Algorithm in the ‘burn-in’ phase.

Parameter $𝜽$	Initial value $𝜽^{(0)}$	Proposal standard deviation $\sqrt{V_{j}}$
$U_{0}$	0 m s^-1	0.001 m s^-1
$ϕ_{0}$	0 rad	0.05 rad
$ν_{1}$	0 s^-1	$2.5 \times 10^{- 8}$ s^-1
$ν_{2}$	0 s^-1	$2.5 \times 10^{- 8}$ s^-1
$ϕ_{𝖠}$	0 rad	0.05 rad
$γ_{1}$	1000 m² s^-1	100 m² s^-1
$γ_{2}$	500 m² s^-1	50 m² s^-1
$ϕ_{𝖪}$	0 rad	0.05 rad

Equations62

κ_{abs} (τ) = \frac{1}{2} \frac{d}{d τ} ⟨ S_{i} (τ) \otimes S_{i} (τ) ⟩,

κ_{abs} (τ) = \frac{1}{2} \frac{d}{d τ} ⟨ S_{i} (τ) \otimes S_{i} (τ) ⟩,

κ_{abs} (τ) = \frac{1}{2} \frac{d}{d τ} ⟨ (S_{i} (τ) - ⟨ S_{i} (τ) ⟩) \otimes (S_{i} (τ) - ⟨ S_{i} (τ) ⟩) ⟩,

κ_{abs} (τ) = \frac{1}{2} \frac{d}{d τ} ⟨ (S_{i} (τ) - ⟨ S_{i} (τ) ⟩) \otimes (S_{i} (τ) - ⟨ S_{i} (τ) ⟩) ⟩,

κ_{rel} (τ) = \frac{1}{4} \frac{d}{d τ} ⟨ (X_{i_{1}} (τ) - X_{i_{2}} (τ)) \otimes (X_{i_{1}} (τ) - X_{i_{2}} (τ)) ⟩,

κ_{rel} (τ) = \frac{1}{4} \frac{d}{d τ} ⟨ (X_{i_{1}} (τ) - X_{i_{2}} (τ)) \otimes (X_{i_{1}} (τ) - X_{i_{2}} (τ)) ⟩,

κ_{Davis} (x; τ) = \int_{- τ}^{0} ⟨ [\dot{X_{i}} (t) - \overline{u} (x)] \otimes [\dot{X_{i}} (t + s) - \overline{u} (X_{i} (t + s))] ⟩_{{X_{i} (t) = x}} d s,

κ_{Davis} (x; τ) = \int_{- τ}^{0} ⟨ [\dot{X_{i}} (t) - \overline{u} (x)] \otimes [\dot{X_{i}} (t + s) - \overline{u} (X_{i} (t + s))] ⟩_{{X_{i} (t) = x}} d s,

\frac{d X}{d t} = u (X (t), t),

\frac{d X}{d t} = u (X (t), t),

d X = [U (X (t)) + \nabla \cdot K (X (t))] d t + 2 K (X (t)) d W .

d X = [U (X (t)) + \nabla \cdot K (X (t))] d t + 2 K (X (t)) d W .

\frac{\partial π}{\partial t} + \nabla \cdot (U π) = \nabla \cdot (K \nabla π),

\frac{\partial π}{\partial t} + \nabla \cdot (U π) = \nabla \cdot (K \nabla π),

U (x) = U (x; θ) and K (x) = K (x; θ),

U (x) = U (x; θ) and K (x) = K (x; θ),

p (θ ∣ R) = \frac{p ( R ∣ θ ) p ( θ )}{\int p ( R ∣ θ ) p ( θ ) d θ} \propto p (R ∣ θ) p (θ),

p (θ ∣ R) = \frac{p ( R ∣ θ ) p ( θ )}{\int p ( R ∣ θ ) p ( θ ) d θ} \propto p (R ∣ θ) p (θ),

R = {(i, X_{i} (t_{j}), t_{j}) : i = 1, \dots, N, j = 1, \dots, P},

R = {(i, X_{i} (t_{j}), t_{j}) : i = 1, \dots, N, j = 1, \dots, P},

R = {(X_{i} (t_{j}), X_{i} (t_{j + 1}), t_{j + 1} - t_{j}) : i = 1, \dots, N, j = 1, \dots, P - 1} .

R = {(X_{i} (t_{j}), X_{i} (t_{j + 1}), t_{j + 1} - t_{j}) : i = 1, \dots, N, j = 1, \dots, P - 1} .

p (R ∣ θ) = i = 1 \prod N j = 1 \prod P - 1 π (X_{i} (t_{j + 1}), t_{j + 1} - t_{j} ∣ X_{i} (t_{j})) .

p (R ∣ θ) = i = 1 \prod N j = 1 \prod P - 1 π (X_{i} (t_{j + 1}), t_{j + 1} - t_{j} ∣ X_{i} (t_{j})) .

\int f (θ) p (θ ∣ R) d θ

\int f (θ) p (θ ∣ R) d θ

\mathbf{u}(\mathbf{x})=U_{\mathrm{TG}}\left(\begin{array}[]{c}-\sin\left({2\pi x}/{L}\right)\cos\left({2\pi y}/{L}\right)\\ \cos\left({2\pi x}/{L}\right)\sin\left({2\pi y}/{L}\right)\end{array}\right)+U_{\mathrm{M}}\left(\begin{array}[]{c}\cos\phi_{\mathrm{M}}\\ \sin\phi_{\mathrm{M}}\end{array}\right),

\mathbf{u}(\mathbf{x})=U_{\mathrm{TG}}\left(\begin{array}[]{c}-\sin\left({2\pi x}/{L}\right)\cos\left({2\pi y}/{L}\right)\\ \cos\left({2\pi x}/{L}\right)\sin\left({2\pi y}/{L}\right)\end{array}\right)+U_{\mathrm{M}}\left(\begin{array}[]{c}\cos\phi_{\mathrm{M}}\\ \sin\phi_{\mathrm{M}}\end{array}\right),

d X = u (X (t)) d t + 2 κ d W,

d X = u (X (t)) d t + 2 κ d W,

U

U

K

\mathsf{R}\left(\phi_{\mathsf{K}}\right)=\left(\begin{array}[]{cc}\cos\phi_{\mathsf{K}}&-\sin\phi_{\mathsf{K}}\\ \sin\phi_{\mathsf{K}}&\cos\phi_{\mathsf{K}}\end{array}\right)

\mathsf{R}\left(\phi_{\mathsf{K}}\right)=\left(\begin{array}[]{cc}\cos\phi_{\mathsf{K}}&-\sin\phi_{\mathsf{K}}\\ \sin\phi_{\mathsf{K}}&\cos\phi_{\mathsf{K}}\end{array}\right)

θ = (U, ϕ, γ_{1}, γ_{2}, ϕ_{K})^{T} .

θ = (U, ϕ, γ_{1}, γ_{2}, ϕ_{K})^{T} .

π (X_{i} (t_{j + 1}), s ∣ X_{i} (t_{j})) = \frac{1}{2 π det Σ _{s}} exp (- \frac{1}{2} ∥ X_{i} (t_{j + 1}) - m_{s} (X_{i} (t_{j})) ∥_{Σ_{s}^{- 1}}^{2}),

π (X_{i} (t_{j + 1}), s ∣ X_{i} (t_{j})) = \frac{1}{2 π det Σ _{s}} exp (- \frac{1}{2} ∥ X_{i} (t_{j + 1}) - m_{s} (X_{i} (t_{j})) ∥_{Σ_{s}^{- 1}}^{2}),

m_{s} (x) = x + U s, Σ_{s} = 2 s K

m_{s} (x) = x + U s, Σ_{s} = 2 s K

∥ v ∥_{Σ_{s}^{- 1}}^{2} = v^{T} Σ_{s}^{- 1} v .

∥ v ∥_{Σ_{s}^{- 1}}^{2} = v^{T} Σ_{s}^{- 1} v .

U (x)

U (x)

K

u_{0}

u_{0}

A

θ = (U_{0}, ϕ_{0}, ν_{1}, ν_{2}, ϕ_{A}, γ_{1}, γ_{2}, ϕ_{K})^{T} .

θ = (U_{0}, ϕ_{0}, ν_{1}, ν_{2}, ϕ_{A}, γ_{1}, γ_{2}, ϕ_{K})^{T} .

π (X_{i} (t_{j + 1}), s ∣ X_{i} (t_{j})) = \frac{1}{2 π det ( Σ _{s} )} exp (- \frac{1}{2} ∥ X_{i} (t_{j + 1}) - m_{s} (X_{i} (t_{j})) ∥_{Σ_{s}^{- 1}}^{2}),

π (X_{i} (t_{j + 1}), s ∣ X_{i} (t_{j})) = \frac{1}{2 π det ( Σ _{s} )} exp (- \frac{1}{2} ∥ X_{i} (t_{j + 1}) - m_{s} (X_{i} (t_{j})) ∥_{Σ_{s}^{- 1}}^{2}),

m_{s} (x)

m_{s} (x)

Σ_{s}

θ^{(k + 1)} = {ϑ θ^{(k)} with probability α, with probability 1 - α,

θ^{(k + 1)} = {ϑ θ^{(k)} with probability α, with probability 1 - α,

α = min (1, \frac{p ( ϑ ∣ R )}{p ( θ ^{(k)} ∣ R )} \frac{P ( θ ^{(k)} ∣ ϑ )}{P ( ϑ ∣ θ ^{(k)} )}),

α = min (1, \frac{p ( ϑ ∣ R )}{p ( θ ^{(k)} ∣ R )} \frac{P ( θ ^{(k)} ∣ ϑ )}{P ( ϑ ∣ θ ^{(k)} )}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Bayesian inference of ocean diffusivity from Lagrangian trajectory data

Y. K. Ying

J. R. Maddison

J. Vanneste

School of Mathematics and Maxwell Institute for Mathematical Sciences, The University of Edinburgh, Edinburgh, EH9 3FD, United Kingdom

Abstract

A Bayesian approach is developed for the inference of an eddy-diffusivity field from Lagrangian trajectory data. The motion of Lagrangian particles is modelled by a stochastic differential equation associated with the advection–diffusion equation. An inference scheme is constructed for the unknown parameters that appear in this equation, namely the mean velocity, velocity gradient, and diffusivity tensor. The scheme provides a posterior probability distribution for these parameters, which is sampled using the Metropolis–Hastings algorithm. The approach is applied first to a simple periodic flow, for which the results are compared with the prediction from homogenisation theory, and then to trajectories in a three-layer quasigeostrophic double-gyre simulation. The statistics of the inferred diffusivity tensor are examined for varying sampling interval and compared with a standard diagnostic of ocean diffusivity. The Bayesian approach proves capable of estimating spatially-variable anisotropic diffusivity fields from a relatively modest amount of data while providing a measure of the uncertainty of the estimates.

keywords:

Bayesian inference; Lagrangian particles; ocean diffusivity; stochastic differential equations; Markov Chain Monte Carlo

1 Introduction

Turbulent processes can lead, on sufficiently long time scales, to diffusive mixing of tracer quantities (Taylor, 1922; Majda and Kramer, 1999). In the ocean large-scale instabilities gives rise to geostrophic eddies. These energetic eddies dominate the redistribution of heat and tracers both laterally and vertically (e.g. Jayne and Marotzke, 2002) and contribute to the formation of large-scale circulation patterns (e.g. Marshall and Radko, 2003; Hallberg and Gnanadesikan, 2006). The mixing induced by these eddies is typically modelled through an “eddy diffusivity”. Diffusive models can be shown to be valid in limiting cases (e.g. Davis, 1987; Majda and Kramer, 1999), although the empirically long ( $\sim 100$ days) time for the diffusive regime to come into effect in some parts of the ocean (Rypina et al., 2012) makes their general applicability questionable.

There are multiple approaches for the diagnosis of turbulent ocean eddy diffusivities, which are not obviously equivalent. One can diagnose a diffusivity from turbulent eddy fluxes (e.g. Bachman and Fox-Kemper, 2013), although this may be prone to ambiguity due to the possible presence of rotational fluxes (Marshall and Shutts, 1981). Alternatively, observations of the motion of tracer contours can be used to define an eddy diffusivity (Nakamura, 1996; Marshall et al., 2006). A separate broad class of diffusivity diagnostics is based upon observations of the motion of fluid parcels (e.g. LaCasce, 2008; van Sebille et al., 2018), which may for example be obtained from simulated Lagrangian trajectories, or from ocean drifter data. For comparisons between these approaches see Klocker et al. (2012) and Abernathey et al. (2013).

Consider Lagrangian particles, where the $i$ th particle has position $\mathbf{X}_{i}\left(t\right)$ and corresponding displacement $\mathbf{S}_{i}\left(t\right)=\mathbf{X}_{i}\left(t\right)-\mathbf{X}_{i}\left(0\right)$ . In a statistically stationary and homogeneous flow one may define an absolute diffusivity based upon the absolute dispersion of particles (Taylor, 1922; LaCasce, 2008)

[TABLE]

where $\left<\cdot\right>$ denotes an appropriate average over particles, such as an ensemble average, and $\tau$ represents a time window over which the particle trajectories are considered. As $\tau\to\infty$ , $\kappa_{\mathrm{abs}}\left(\tau\right)$ converges to a constant and characterises the asymptotic growth rate of particle dispersion. This definition makes no correction for the possible presence of a background mean flow, which can for example be accounted for via

[TABLE]

correcting for a mean drift (e.g. Sallée et al., 2008).

Retaining the assumption of a statistically stationary and homogeneous flow, one may define a relative diffusivity (e.g. LaCasce, 2008)

[TABLE]

where now the average is taken over all distinct pairs of particles ( $i_{1}\neq i_{2}$ ). This automatically takes account of the presence of a uniform background mean flow. Such a relative diffusivity has been used to study energy spectra in fluid turbulence (e.g. Koszalka et al., 2009; Lumpkin and Elipot, 2010).

The above definitions make use of statistical homogeneity to yield a single bulk uniform diffusivity. This is problematic if the diffusivity is expected to vary in different regions of the ocean. To account for this, Davis (1987, 1991) defines the spatially dependent diffusivity

[TABLE]

where the conditional average $\left<\cdot\right>_{\{\mathbf{X}_{i}\left(t\right)=\mathbf{x}\}}$ is taken over all trajectories $\mathbf{X}_{i}\left(t\right)$ that pass through position $\mathbf{x}$ at some time $t$ . While this definition captures spatial variations in diffusivity, it requires the choice of an appropriate background mean flow $\overline{\mathbf{u}}\left(\mathbf{x}\right)$ . Its implementation is further complicated by the need for past history information of particles which arrive at a common point – in practice this necessitates local binning of particles which arrive in the vicinity of a point, and may also be replaced with future information of particles which leave the vicinity (e.g. Oh et al., 2000; Griesel et al., 2010; Klocker et al., 2012; Rühs et al., 2018).

A concern in the Davis (1987) diffusivity is its dependence on the time-lag parameter $\tau$ . One may hope for convergence in the large- $\tau$ limit, after some characteristic decorrelation time, but this decorrelation time may be sufficiently large that the particles have left the neighbourhood of $\mathbf{x}$ . As a result, particles involved in the calculation experience different flow regions, with different diffusivity properties, over the timescale $\tau$ over which the integral is taken. These non-local effects mean that care needs to be exercised when interpreting the spatial dependence of the Davis (1987) diffusivity. Further, there is the concern that in general this diffusivity need not be non-negative definite, nor even symmetric.

In this article we present a new approach for the diagnosis of ocean eddy diffusivity from Lagrangian particle data using Bayesian inference. Given a stochastic model for the particle motion, discretely observed Lagrangian particle positions, and prior information, the approach infers a joint posterior probability distribution for both a local flow velocity and a local anisotropic diffusivity tensor. This probability distribution makes it possible, for example, to compute mean quantities or to find maximum a posteriori estimates, and to quantify the uncertainty of these estimates.

The paper is organised as follow. In section 2 the Bayesian inference approach and its implementation using Monte Carlo Markov Chain are described. Section 3 provides an application in an idealised configuration. In section 4 the approach is applied to Lagrangian particle data obtained from a three-layer quasigeostrophic double-gyre calculation, and the resulting diffusivity diagnosis is compared against the Davis (1987) diffusivity. The paper concludes in section 5 with an outlook towards more general applications of Bayesian inference to the analysis of Lagrangian drifter data.

2 Mathematical background

2.1 Stochastic Lagrangian particle dynamics

The position $\mathbf{X}\left(t\right)$ of particles advected in a time-dependent velocity field $\mathbf{u}(\mathbf{x},t)$ satisfies the ordinary differential equation

[TABLE]

subject to some initial condition $\mathbf{X}\left(0\right)=\mathbf{x}_{0}$ . The concept of eddy diffusivity arises when attempting to coarse-grain this equation: it might be expected that over sufficiently long time scales the behaviour of its solutions is well captured by the Markov-0 model (Berloff and McWilliams, 2002), which is a stochastic differential equation (SDE)

[TABLE]

Now $\mathbf{U}$ is a time-independent average velocity field, $\mathsf{K}$ is the eddy diffusivity which is a symmetric positive definite tensor (whose square root is uniquely defined by requiring that it too be symmetric positive definite), and $\mathbf{W}$ is multi-dimensional Brownian motion. The reduction from (5) to the Markov-0 model (6) can only be justified rigorously, and explicit expressions for $\mathbf{U}$ and $\mathsf{K}$ can only be obtained, when $\mathbf{u}(\mathbf{x},t)$ satisfies strong assumptions of scale separation in time and/or space that are not met in the context of the ocean (see Griffa, 1996, and the reference therein). Here we adopt a heuristic approach and seek to estimate values for $\mathbf{U}$ and $\mathsf{K}$ that are most consistent – in a sense to be explained – with a set of observed particle trajectories $\mathbf{X}_{i}(t)$ .

The evolution of $\mathbf{X}(t)$ according to (6) is entirely characterised by the transition probability density $\pi\left(\mathbf{x},t|\mathbf{x}_{0}\right)$ which defines the probability of finding the particle in the neighbourhood of $\mathbf{x}$ at time $t$ given it is initially at $\mathbf{x}_{0}$ . The transition probability evolves under the Fokker–Planck equation (e.g. Evans, 2013; Pavliotis, 2014)

[TABLE]

with initial condition $\pi\left(\mathbf{x},0|\mathbf{x}_{0}\right)=\delta\left(\mathbf{x}-\mathbf{x}_{0}\right)$ . This is the advection–diffusion equation, and hence (6) is a natural stochastic model for advective and diffusive processes.

The velocity and diffusivity fields $\mathbf{U}$ and $\mathsf{K}$ are fields defined over the entire spatial domain. For practical computations it is necessary to first discretise these fields over space,

[TABLE]

where $\boldsymbol{\theta}$ denotes the degrees of freedom for both $\mathbf{U}$ and $\mathsf{K}$ – that is, $\boldsymbol{\theta}$ is a finite-length vector of parameters which specifies the discrete approximation for $\mathbf{U}$ and $\mathsf{K}$ . Hereafter the dependence of quantities on $\boldsymbol{\theta}$ is omitted, but it should be borne in mind that most objects of interest, the transition probability $\pi$ for instance, have such a dependence. The problem of estimating the discretised velocity and diffusivity fields now reduces to the estimation of $\boldsymbol{\theta}$ . In the Bayesian-inference approach we adopt, $\boldsymbol{\theta}$ is regarded as a random variable and its entire probability distribution, and hence a probability distribution for $(\mathbf{U},\mathsf{K})$ , is estimated from trajectory data.

2.2 Bayesian inference

Given $N$ particles each observed at $P$ distinct times $t_{j}$ , evolving under the SDE (6), Bayes’ theorem gives (a thorough textbook reference for Bayesian statistics is Gelman et al., 2013)

[TABLE]

where the integral is over the full parameter space. $R$ denotes the data, and can be set equal to the full trajectory,

[TABLE]

where $\mathbf{X}_{i}\left(t_{j}\right)$ is the position of the $i$ th particle at the $j$ th observation time. Equivalently, as the SDE is Markovian, $R$ can be replaced with

[TABLE]

That is, the data consist of the start and end positions of each particle between consecutive pairs of observations, and the time separation between the observations. Note that this is easily generalised for the case of differing observation times for each particle and differing particle trajectory lengths.

Three key probability distributions appear in (9): the posterior $p\left(\boldsymbol{\theta}|R\right)$ , the likelihood $p\left(R|\boldsymbol{\theta}\right)$ , and the prior $p\left(\boldsymbol{\theta}\right)$ . The posterior $p\left(\boldsymbol{\theta}|R\right)$ is the probability distribution of the parameter $\boldsymbol{\theta}$ given the observations and the model, and its determination is the goal of the inference. It should be interpreted as an objective measure of the plausibility of a certain value of $\boldsymbol{\theta}$ (and hence of $\mathbf{U}$ and $\mathsf{K}$ ) in view of the observations, assuming the model is perfect. The likelihood $p\left(R|\boldsymbol{\theta}\right)$ is the probability that particles evolving according to (6), and with $(\mathbf{U},\mathsf{K})$ fixed by $\boldsymbol{\theta}$ , have positions matching $R$ . It is given explicitly in terms of a product of transition probabilities

[TABLE]

The prior $p\left(\boldsymbol{\theta}\right)$ is a subjective choice for the plausibility of a given set of parameters $\boldsymbol{\theta}$ in the absence of data. Its importance for the posterior diminishes as the number $N(P-1)$ of data points increases.

2.3 Sampling: Metropolis–Hastings

Assuming we can evaluate the transition probability in (12), Bayes’ formula (9) gives the probability density for the parameters $\boldsymbol{\theta}$ and therefore for $\mathbf{U}$ and $\mathsf{K}$ in an explicit form. This is however a probability density in a high-dimensional space which cannot be visualised and from which derived quantities cannot be computed directly. Instead, one is interested in computing integrals of various quantities against the posterior – that is, in evaluating

[TABLE]

for some $f\left(\boldsymbol{\theta}\right)$ . For example $f\left(\boldsymbol{\theta}\right)=\mathsf{K}$ yields the posterior mean diffusivity, $\bar{\mathsf{K}}$ say, which can used as an estimate for the eddy diffusivity, while $f\left(\boldsymbol{\theta}\right)=\|\mathsf{K}-\bar{\mathsf{K}}\|^{2}$ yields a variance characterising the uncertainty of the estimate $\bar{\mathsf{K}}$ .

Markov Chain Monte Carlo (MCMC) methods can be used to obtain numerical approximations for integrals of the form (13). These methods generate sequences of random samples $\boldsymbol{\theta}^{(k)}$ using a transition probability $T(\boldsymbol{\theta}^{(k+1)}|\boldsymbol{\theta}^{(k)})$ chosen to ensure that, for large $k$ , the $\boldsymbol{\theta}^{(k)}$ are distributed according to $p\left(\boldsymbol{\theta}|R\right)$ . The integrals (13) are then estimated simply by the arithmetic mean of $f(\boldsymbol{\theta}^{(k)})$ . Here we use the well-known Metropolis–Hastings algorithm, based on an acceptance/rejection definition of $T(\boldsymbol{\theta}^{(k+1)}|\boldsymbol{\theta}^{(k)})$ , and more specifically the Gibbs sampler (e.g. Geman and Geman, 1984) for which the successive samples $\boldsymbol{\theta}^{(k)}$ and $\boldsymbol{\theta}^{(k+1)}$ differ in at most one component. The reliable estimation of integrals using MCMC requires monitoring the convergence of the estimates and ensuring that the $\boldsymbol{\theta}^{(k)}$ properly explore the support of $p\left(\boldsymbol{\theta}|R\right)$ ; we adopt the Gelman and Rubin (1992) diagnostic (also in A and in section 11.4 of Gelman et al. (2013)) to verify this.

2.4 Local inference

The specific inference problem considered in this article is conducted in a local cell-wise manner. The domain of interest is partitioned into a coarse mesh, and we seek to obtain information on the flow velocity and diffusivity for each mesh cell. The result of the inference is expected to be dependent on the choice of mesh, and in particular on the mesh cell size. This is consistent with the coarse graining involved in approximating (5) by (6) – the eddy diffusivity obtained is dependent upon the spatial scales.

Note that a meaningful eddy diffusivity is only realised after a decorrelation time scale. Over short time scales, correlated advection associated with the so-called “ballistic” regime (e.g. Pasquero et al., 2007; Rypina et al., 2012) dominates and is incompatible with the diffusive model (6). It is therefore necessary to ensure that the pairs of observed particle positions employed are separated by a sufficient time interval – a principle noted in a multi-scale system in Pavliotis and Stuart (2007) (see also Cotter and Pavliotis, 2009, for an application to eddy diffusivity). An optimal sampling interval, which discards the minimum number of position records while preserving the validity of the model (6), is rarely known a priori. In practice the inference is performed with varying sampling intervals and the convergence of the various estimates is examined. In the local inference approach we take here it is also necessary for the particles to remain in (or at least close to) the cell considered over the sampling interval. There is therefore a trade-off between two competing requirements: the sampling interval must be long enough that the particles do decorrelate, and short enough that they are not transported far from the considered cell. One must therefore take care to choose an appropriate sampling interval between observations, and be aware that this may not always exist. The possibility for a more advanced “non-local” inference, which alleviates this difficulty, is discussed in the conclusions.

3 Idealised example: Taylor–Green vortices with a background flow

3.1 Configuration

A highly idealised model of oceanic eddies in a background current is constructed by superimposing a constant mean flow on top of Taylor–Green vortices, leading to the two-dimensional and doubly-periodic steady velocity field

[TABLE]

where $U_{\mathrm{TG}}$ is the maximum vortex speed, $U_{\mathrm{M}}$ is a background flow speed, and $\phi_{\mathrm{M}}$ is the angle of the background flow to the $x$ -axis. The small-scale advection–diffusion of particles according to

[TABLE]

is considered, where $\kappa\not=0$ is here a small-scale scalar diffusivity. Note that $\kappa$ , which governs the small-scale motion of the particles, is not the object to be inferred in this problem. Rather we seek to infer information about a large-scale effective diffusivity, which governs the long-time behaviour.

Homogenisation theory (e.g. Majda and McLaughlin, 1993; Majda and Kramer, 1999) provides rigorous coarse-graining results for this problem. Specifically, over scales much larger than the vortex period $L$ , the motion of particles is approximated by the SDE (6) with a uniform mean velocity $\mathbf{U}=U_{\mathrm{M}}(\cos\phi_{\mathrm{M}},\sin\phi_{\mathrm{M}})$ and an effective diffusivity tensor $\mathsf{K}$ . The effective diffusivity tensor $\mathsf{K}$ can be computed by solving a two-dimensional elliptic problem known as the “cell problem” (Pavliotis and Stuart, 2008).111 Note that the “effective diffusivity” appearing here should not be confused with the “effective diffusivity” in Marshall et al. (2006).

3.2 Bayesian inference

We apply Bayesian inference to this problem for the uniform velocity and diffusivity

[TABLE]

where

[TABLE]

is a rotation matrix. Thus the parameters to infer are

[TABLE]

The representation (16f) of the diffusivity $\mathsf{K}$ is motivated by its eigendecomposition, and guarantees that it is symmetric positive-definite when $\gamma_{1}$ and $\gamma_{2}$ are positive.

Parameters used in this example are provided in Table 1. The domain size, flow speeds, and small-scale diffusivity are chosen so as to yield an ocean-like regime. Particle trajectory data are generated by solving the SDE (15) for $N$ particles initially located on a uniform square grid in the doubly-periodic domain $\left(x,y\right)\in\left[-L,L\right]^{2}$ . The SDE is solved numerically using the Euler–Maruyama method with a small timestep size of $\Delta t=84.3750$ s. For the purposes of the Bayesian inference their positions are sampled with a sampling interval $s=t_{j}-t_{j-1}$ over a total time of $T=256$ days.

3.3 Posterior evaluation

For the uniform velocity and diffusivity (16), the Fokker–Planck equation can be solved analytically, yielding the Gaussian transition probability density

[TABLE]

where

[TABLE]

and, for a suitably sized vector $\mathbf{v}$ ,

[TABLE]

This gives an explicit expression for the likelihood (12).

In order to perform the Bayesian inference a prior must be chosen. This is a subjective choice reflecting expected prior knowledge regarding the parameters under consideration (the elements of $\boldsymbol{\theta}$ ) and, except in limiting cases of large data, the result of the inference is dependent upon the choice of prior. The priors for the angles $\phi$ and $\phi_{\mathsf{K}}$ set equal to the uniform distribution, and the priors for the remaining parameters are uniform in the ranges $U\in\left[0,10~{}\mathrm{m}~{}\mathrm{s}^{-1}\right]$ and $\gamma_{1},\gamma_{2}\in\left[1~{}\mathrm{m}^{2}~{}\mathrm{s}^{-1},10^{5}~{}\mathrm{m}^{2}~{}\mathrm{s}^{-1}\right]$ , and zero elsewhere.

The posterior is evaluated, up to some unknown proportionality constant, as the product of the likelihood and the prior, noting that the proportionality constant is not required by the Metropolis–Hastings algorithm. In total $3$ independent set of $100,000$ samples $\boldsymbol{\theta}^{(k)}$ are drawn, and it is verified that the Gelman–Rubin diagnostic criterion (see A) is satisfied.

3.4 Results

The posterior mean velocity components (not shown) show little variability with sampling interval and agree excellently with the background flow. The posterior mean diffusivity components are shown in figure 1, and show much greater variability. For example, over short time scales the particles experience only local small-scale dynamics, and hence short sampling intervals are associated with low values of inferred diffusivity. The diffusivity components increase with increasing sampling interval and approach a stable value. As the sampling interval increases, the number of particle positions used in the inference decreases (since the same length of particle trajectory is considered in all cases). As a result, the uncertainty of the inference increases, leading to a widening of the posterior distribution.

For reference the effective diffusivity $\mathsf{K}$ of homogenisation theory is computed by solving the elliptic “cell problem” (e.g. Eq. (2.2) of Cotter and Pavliotis, 2009). The equations are solved using degree-one continuous Lagrange finite elements using the FEniCS system (Logg et al., 2012; Alnæs et al., 2015) version 2018.1.0. A finite element mesh is formed via a $512\times 512$ structured and uniform square mesh, with each square divided along the lower-left to upper-right diagonal to form a triangle mesh. The results are shown with dashed lines in figure 1. The larger sampling interval posterior mean diffusivity components, obtained using Bayesian inference, agree well with the computed effective diffusivity.

4 Quasigeostrophic double gyre

The Bayesian inference machinery, illustrated in the preceding section for a highly idealised example, is now applied in a more oceanographically relevant context by considering Lagrangian particle trajectories in a quasigeostrophic double-gyre calculation.

4.1 Numerical model

The three-layer quasigeostrophic double gyre configuration of Maddison et al. (2015) is considered (see also Berloff et al., 2007; Karabasov et al., 2009; Marshall et al., 2012). The three-layer quasigeostrophic equations (see Maddison et al., 2015, section 3.1) are discretised using finite differencing, with a mesh with $513\times 513$ nodes uniformly spaced on a square grid, in a $3840\mathrm{~{}km}\times 3840\mathrm{~{}km}$ square horizontal domain. The advection term in the quasigeostrophic potential vorticity equation is discretised using the Arakawa (1966) Jacobian, and Laplace operators are discretised using second order centered differencing. The elliptic problem for potential vorticity inversion is solved via projection onto discrete baroclinic modes, and the resulting Poisson or modified Helmholtz problems are solved using a Fast Poisson Solver (e.g. Strang, 1986, section 5.5), with the decoupled tri-diagonal systems arrived at using a Discrete Sine Transform using FFTPACK 5.1. The system is integrated in time using a third-order Adams–Bashforth scheme with uniform timestep $\Delta t_{QG}=1800$ s. Physical parameters are as in Table 1 of Maddison et al. (2015).

4.2 Particle advection

Particles are advected using the geometric integration approach described in Ham et al. (2006) and Ham (2006). A piecewise linear streamfunction is constructed from the finite-difference grid point values by dividing each square cell corner-to-corner to yield four isosceles triangles, bi-linearly interpolating to yield a value at the centre vertex, and then linearly interpolating within the triangles. The time-dependent streamfunction is further linearly interpolated in time. Initial starting cells are determined using a quad-tree based search (Samet, 1984) using code derived from libsupermesh (Panourgias and Maddison, 2016), after which they are advected along contours of the discrete streamfunction. Note that care needs to be taken to ensure that the particle advection – which is a two-dimensional computational geometry problem – is solved in a precision-robust manner. A useful property of the particle advection scheme is that, given a streamfunction which is constant on the boundary, particles are guaranteed to never leave the bounding domain (see Ham et al., 2006). Hence the particle advection scheme requires no further consideration of boundary condition.

We consider only particle advection, with no explicit small-scale diffusivity, within the middle layer of the model. This layer experiences no direct wind forcing or bottom linear drag. After a $100$ year spinup222Julian years are used throughout. $625$ particles are distributed uniformly across the square domain. This number is chosen so as to resemble the typical number of ARGO drifters available in the North Atlantic (Argo, 2000). The particles are then advected for a further $10$ years, and their positions are recorded daily. The resulting trajectories for $60$ arbitrarily selected particles are shown in figure 2.

4.3 Bayesian inference

The domain is partitioned into a $16\times 16$ array of square cells with $240$ km side lengths. Within each cell the velocity is represented as a linearly varying non-divergent field, and the diffusivity as a constant symmetric positive definite tensor,

[TABLE]

where

[TABLE]

Here $\mathsf{R}$ is a rotation matrix as in (17) and $\mathbf{x}_{0}$ is the centre of the cell. Thus the parameters to infer in each cell are

[TABLE]

The linear velocity field introduces additional degrees of freedom compared with the uniform velocity field used in section 3. It is motivated by the large shears that are present in the jet region of the simulation.

4.4 Posterior evaluation

The Fokker–Planck equation can be solved analytically for the velocity and diffusivity (22), yielding the Gaussian transition probability density

[TABLE]

where

[TABLE]

and $\left\|\mathbf{v}\right\|_{\Sigma_{s}^{-1}}$ is defined in (21). This gives an explicit expression for the likelihood (12).

We take again simple uniform priors for $p(\boldsymbol{\theta})$ : the angles $\phi_{0}$ , $\phi_{\mathsf{A}}$ , and $\phi_{\mathsf{K}}$ are uniform, and remaining parameters are uniformly distributed in the ranges $U_{0}\in\left[0,10~{}\mathrm{m}~{}\mathrm{s}^{-1}\right]$ , $\nu_{1},\nu_{2}\in\left[-10^{-5}~{}\mathrm{s}^{-1},10^{-5}~{}\mathrm{s}^{-1}\right]$ , $\gamma_{1},\gamma_{2}\in\left[1~{}\mathrm{m}^{2}~{}\mathrm{s}^{-1},10^{5}~{}\mathrm{m}^{2}~{}\mathrm{s}^{-1}\right]$ and are zero elsewhere. It has been verified that the results would be unaffected if these ranges were extended.

The posterior is evaluated, up to some unknown proportionality constant, as the product of the likelihood and the prior. In total $10$ independent chains of $4\times 10^{5}$ samples $\boldsymbol{\theta}^{(k)}$ are then drawn using the Metropolis–Hastings algorithm and the Gelman and Rubin (1992) diagnostic (also in A and section 11.4 of Gelman et al. (2013)) to test convergence. This process is performed separately for each cell of the $16\times 16$ array covering the model domain. We consider the sampling intervals $s=1,\,2,\,4,\,8,16,\,32,\,48,\,\cdots,\,128$ days. The samples of each of the independent chains are combined to approximate the posterior distribution.

4.5 Results

Figure 3 shows the maximum a posteriori estimate (MAP) for the middle-layer velocity field, together with the Eulerian mean flow computed over the $10$ -year data collection window. The MAP estimate of $\boldsymbol{\theta}$ is the maximiser for the posterior $p\left(\boldsymbol{\theta}|R\right)$ and indicates the most likely combination of mean flow and diffusivity fields to recover the trajectory data. In all cases described here the MAP estimate is approximated by the sample $\boldsymbol{\theta}^{(k)}$ that maximises the posterior $p(\boldsymbol{\theta}^{(k)}|R)$ over all MCMC steps $k$ and over all chains. For a short sampling interval $s=1$ day, the MAP flow velocity is comparable to the Eulerian mean velocity.

Figure 4 shows the fraction of particles which are found in their cell of origin or in one of the eight surrounding cells at the end of sampling interval (regardless of the intermediate trajectory). This provides an indication of the validity of the locality assumption inherent in the local inference approach. For short sampling intervals this fraction is high, but as expected it drops as the sampling interval increases; in particular, it drops to very low values in the jet and on the western boundary. There is therefore potential misattribution of the spatial location of flow properties in these regions. This is a significant issue on the western boundary, where particles flow rapidly from the boundary into the jet, and rapidly change direction from a northward or southward flow, to an eastward flow.

At short sampling intervals $(\lesssim 16~{}\mathrm{days})$ , strong shears are inferred along the jet and on the northern, western, and southern boundaries. This is indicated by the large local vorticity $\omega=\partial_{x}v-\partial_{y}u$ , corresponding to the off-diagonal elements of $\mathsf{A}-\mathsf{A}^{\mathrm{T}}$ , shown in figure 5. The inferred diffusivity in these areas is significantly reduced (not shown) when the spatial gradients of the mean flow are resolved, by permitting a non-zero linear shear. For the large sampling intervals the inferred shear tensor is smaller, as may be expected for a Lagrangian average of the flow over these time scales. Hence for the large sampling intervals the inferred diffusivity is largely unaffected by the inclusion of shear in $\mathbf{U}(\mathbf{x};\boldsymbol{\theta})$ , and an inference with a locally constant velocity would yield similar results.

Figure 6 visualises the MAP estimate for the middle layer diffusivity tensor for differing sampling intervals. The “diffusivity ellipses” in figure 6 outline the orientations of contours of a passive tracer if it undergoes pure diffusion with a Dirac-delta initial profile, characterising the directions of the anisotropy of the eddy diffusion tensor. The diffusivity magnitude, defined as the half trace of the diffusivity tensor, is visualised using the colour scale. The inferred diffusivity is largest in the jet region, and strengthens with increasing sampling interval. There is a region of very weak inferred diffusivity in the eastern part of the southern half of the domain. At large sampling interval the anisotropic diffusion has a preferential east-west orientation in the gyres and the core of the jet. Near the western boundary the anisotropic diffusivity is tilted towards the direction of the jet – this is attributed to non-local effects, as particles are rapidly transported into the jet from this region.

The Metropolis–Hastings algorithm samples the joint posterior distribution of the velocity and diffusivity and so makes it possible to infer quantities that depend on both fields. In particular, we can construct distributions for the cross-stream and along-stream diffusivity components $K^{\bot}$ and $K^{\parallel}$ by projecting for each sample $k$ , the sample diffusivity $\mathsf{K}^{(k)}$ , in directions perpendicular to and parallel to the sample velocity $\mathbf{U}^{(k)}$ . The resulting MAP estimates are shown in figure 7. For comparison, the cross-stream and along-stream Davis (1987) diffusivity, defined with respect to the $10$ year Eulerian mean flow at the cell centre, are shown in figure 8.

The two diagnostic approaches generally agree well in order of magnitude and spatial structure, with increased diffusivity in the region of the jet and reduced diffusivity on the eastern boundary and in the region south of the jet, as indicated by their relative differences in figure 9. There is some disagreement in detail, for example near the northern and southern boundaries. Note that the Davis (1987) diffusivity as computed here is not a symmetric positive definite (or even symmetric) quantity in general, leading to some regions of missing data indicated in white in figures 8 and 9.

To analyse our results in more detail, we now focus on the 9 cells highlighted in figure 2 and labelled (i)–(iii) with increasing $x$ coordinate, and (a)–(c) with increasing $y$ coordinate. Figures 10 and 11 show the MAP of the middle layer cross-stream and along-stream diffusivity in these cells as functions of the sampling interval $s$ . The Davis (1987) diffusivity is shown for comparison; the time lag $\tau$ and sampling interval $s$ are shown on a common scale even though the two parameters are not strictly comparable. The MAP diffusivities do demonstrate a degree of convergence at larger sampling intervals, and agree in order of magnitude, at larger sampling intervals, with the large time-lag Davis (1987) diffusivity. The MAP diffusivities are never negative, as a consequence of the choice of prior, and while some variation is observed with sampling interval, the Bayesian diffusivity estimates are generally more stable in magnitude than the Davis (1987) diffusivity values.

One of the advantages of the Bayesian approach is that it provides a probability distribution, rather than single estimates for $\boldsymbol{U}$ and $\mathsf{K}$ , and hence allows for a quantification of the uncertainty. This is illustrated in Figures 10 and 11 which also show the (marginal) posterior probability density for the two diffusivity components at each sampling interval $s$ . The probability densities are shown as shading and normalised by their maximum value at each value of $s$ . Broadly speaking, the figures suggest that the range of plausible values is reasonably well constrained, with low probabilities for values more than a factor of, say 2, away from the MAP. Nevertheless, relatively long tails of the posterior distribution indicate that there is a significant probability of diffusivities of much larger magnitude that the MAP values. There are cases of multi-modality, for example in the lower right panel of figure 10 and figure 11, with in this cases a MAP value which switches between the two local maxima. We attribute this to weakness of the flow in these regions which leads to an ambiguity in the flow direction and hence in the decomposition between along-stream and cross-stream diffusivity.

5 Conclusions and future work

This article introduces the application of Bayesian inference to the diagnosis of eddy diffusivities from Lagrangian trajectory data. Assuming that the trajectories are governed by a stochastic differential equation involving a number of parameters, the Bayesian inference machinery provides an objective way of incorporating all available data so as to yield a full multidimensional posterior probability distribution for the parameters, which quantifies their plausibility. We utilise this to estimate both an anisotropic diffusivity tensor and a linearly varying non-divergent velocity, and to quantify the uncertainty of the estimates.

Note that the posterior distribution has a very specific interpretation: it is a probability density for the parameters, assuming a perfect model, and given the data and prior information. The posterior can exhibit spread due to lack of data, as weighted against the prior, but not due to error in the underlying model. Further, while we may anticipate convergence with increasing particle number or sampling interval, such limits may in practical oceanographic cases not be achievable.

An idealised experiment, consisting of Taylor–Green vortices embedded within a background flow, is considered. Here, with sufficiently long sampling intervals, the inferred diffusivity components agree well with the predictions from homogenisation theory. In a more complex quasigeostrophic model of a three-layer oceanic gyre system, a “local” approach is applied to infer the middle layer mean flows and diffusivities independently in each of $16\times 16$ cells partitioning the domain. The results show that the data of 625 trajectories over 10 years constrain the diffusivity within a factor of about 2 in most of the domain. The values found become relatively insensitive to the sampling time when this exceeds 30 days or so and are roughly comparable to the Davis (1987) diffusivity.

We emphasise that the Bayesian approach provides a general framework for the inference of diffusivity which extends well beyond the simple implementation presented in this paper. A crucial limitation of this implementation is the assumption of locality, which supposes that particles observed from a given cell are advected by the same flow velocity and experience the same diffusivity over the entire sampling interval. Even with the relatively large size of cells considered here (240 km), this assumption is problematic, especially near the western boundary and in the region of the separated jet, where the trajectories of many particles straddle several cells. This limitation is not inherent to the Bayesian framework and can in principle be overcome by considering a discretisation of the velocity and diffusivity over the entire domain, and inferring all associated degrees of freedom simultaneously. Two challenges need to be addressed in this more general case: first, the MCMC sampling of the posterior distribution needs to be performed over a space of much higher dimension; second, the transition probability, which solves a Fokker–Planck (i.e., advection–diffusion) equation with spatially varying velocity and diffusivity, cannot be evaluated in closed form. The first challenge is not necessarily a major one: theoretical results (Roberts et al., 1997) suggests that the complexity of the simultaneous sampling of all the parameters need not be markedly different from that of the combined sampling of the (independent) parameters associated with a single cell. The second challenge requires efficient methods to compute, likely in an approximated form, the transition probability from the Fokker–Planck equation. This is the subject of ongoing work.

In addition to offering a systematic method to make best use of all available data to estimate diffusivities, the Bayesian approach has the advantage of providing a quantification of the uncertainty of these estimates by means of a complete probability density function. This is important when the amount of data is limited, e.g. for estimates based on real drifters as opposed to simulated trajectories, and could be used prior to measurement campaigns to help assessing how many drifters are needed. Beyond this, we also note that a Bayesian approach can be employed for model selection and determine whether stochastic differential equations more sophisticated than (6) (e.g., as in Berloff and McWilliams, 2002) are necessary to explain observed trajectories. This is another direction of future work.

Acknowledgments

YKY was supported by the Principal’s Career Development PhD Scholarships and Edinburgh Global Scholarships. YKY acknowledges advice from Alexa Griesel regarding the Davis (1987) diffusivity. The authors are grateful to Luis Zavala Sansón for useful comments on the manuscript.

Appendix A Metropolis–Hastings algorithm

A.1 Algorithm outline

The Metropolis–Hastings Algorithm (e.g. section 11.2 of Gelman et al., 2013) is outlined as follows.

Set $k=0$ . Choose a proposal density $P(\cdot|\cdot)$ and take an initial value for the parameter $\boldsymbol{\theta}^{(0)}$ . 2. 2.

Iterate:

(a)

randomly draw a candidate parameter $\boldsymbol{\vartheta}$ with probability $P(\boldsymbol{\vartheta}|\boldsymbol{\theta}^{(k)})$ , 2. (b)

compute $P(\boldsymbol{\vartheta}|\boldsymbol{\theta}^{(k)})$ and $P(\boldsymbol{\theta}^{(k)}|\boldsymbol{\vartheta})$ , 3. (c)

compute $p(\boldsymbol{\theta}^{(k)}|R)$ and $p\left(\boldsymbol{\vartheta}|R\right)$ (up to an irrelevant proportionality constant) from Bayes’ formula (9), using the fields $(\mathbf{u}(\mathbf{x};{\boldsymbol{\theta}^{(k)}}),\mathsf{K}(\mathbf{x};{\boldsymbol{\theta}^{(k)}}))$ or $(\mathbf{u}(\mathbf{x};\boldsymbol{\vartheta}),\mathsf{K}(\mathbf{x};\boldsymbol{\vartheta}))$ for the transition probability $\pi$ , 4. (d)

let

[TABLE]

where

[TABLE] 5. (e)

increment $k\mapsto k+1$ .

The proposal density $P(\boldsymbol{\vartheta}|\boldsymbol{\theta}^{(k)})$ should be easy to compute. In this paper, we take it such that all the components of $\boldsymbol{\vartheta}$ but one are the same as the components of $\boldsymbol{\theta}^{(k)}$ – a technique known as the “Gibbs sampler” (Geman and Geman, 1984). Specifically, we take it as the Gaussian

[TABLE]

where $j=1,\cdots,J$ labels the components of $\boldsymbol{\theta}$ and the variances $V_{j}$ are tuned for efficient sampling (see below). Note that $P(\boldsymbol{\vartheta}|\boldsymbol{\theta}^{(k)})=P(\boldsymbol{\theta}^{(k)}|\boldsymbol{\vartheta})$ , which simplifies the form of $\alpha$ in (28) and eliminates the need for step 2(b).

It should be noted that it is only the distribution of $\boldsymbol{\theta}^{(k)}$ (the stationary distribution) that converges to the target posterior $p(\boldsymbol{\theta}|R)$ . Hence initial samples of the Markov chain should be treated as ‘burn-in’, that is, only the distribution of $\boldsymbol{\theta}^{(k)}$ for $k$ exceeding a threshold should be considered. In this article, we discard the first half of the $\boldsymbol{\theta}^{(k)}$ for this reason.

To determine the number of MCMC steps needed for the sample distribution of $\boldsymbol{\theta}^{(k)}$ to converge to the target posterior $p(\boldsymbol{\theta}|R)$ , the Gelman–Rubin convergence test (Gelman and Rubin (1992); Brooks and Gelman (1998), also section 11.4 of Gelman et al. (2013)), which compares multiple chains of $\boldsymbol{\theta}^{(k)}$ under different initial conditions $\boldsymbol{\theta}^{(0)}$ , is performed. In this article the convergence of the sample distribution to the target is said to have achieved when $\hat{R}$ (as defined in (11.4) of Gelman et al. (2013)) corresponding to each component of $\boldsymbol{\theta}$ falls below $1.2$ .

A.2 Tuning

To sample the distribution of $p(\boldsymbol{\theta}|R)$ efficiently, the variance of the proposal distribution $V_{j}$ needs to be tuned. A small variance $V_{j}$ leads to successive $\boldsymbol{\theta}^{(k)}$ that are very close to one another, while a large $V_{j}$ leads to numerous rejections; in both cases the support of $p(\boldsymbol{\theta}|R)$ is explored too slowly. For an optimal algorithmic efficiency, a common practice is to maintain the fraction of the candidates $\boldsymbol{\vartheta}$ being accepted to be approximately $0.25$ (Roberts et al., 1997). Note that this is measured only after the ‘burn-in’ phase. A table listing the initial values for the parameter $\boldsymbol{\theta}^{(0)}$ and the proposal standard deviation $\sqrt{V_{j}}$ (before tuning) is given in table 2.

The parameter $\boldsymbol{\theta}_{\mathrm{MAP}}$ that maximises the posterior density $p(\boldsymbol{\theta}|R)$ is used as the initial conditions for tuning $V_{j}$ and post-‘burn-in’ sampling. To tune $V_{j}$ , the algorithm is re-run with an additional $8000$ steps, during which the fraction of $\boldsymbol{\theta}^{(k)}$ accepted is recorded. If the acceptance fraction exceeds $0.35$ , $V_{j}$ is multiplied by $4/3$ ; if it is lower than $0.15$ , $V_{j}$ is multiplied by $2/3$ . The tuning process is repeated for up to 20 times and stops once the acceptance fraction falls in the range of $[0.15,0.35]$ , in the neighbourhood of the advised value $0.25$ (Roberts et al., 1997). With the tuned variance $V_{j}$ the Metropolis–Hasting algorithm is re-run with initial condition $\boldsymbol{\theta}_{\mathrm{MAP}}$ and the samples of $\boldsymbol{\theta}^{(k)}$ are used for inference.

Appendix B Calculating the Davis diffusivity

The along-stream and cross-stream Davis (1987) diffusivities are calculated using the $10$ -year Eulerian mean flow at the centre of each cell to define the mean velocity $\bar{\mathbf{u}}\left(\mathbf{x}\right)$ appearing in equation (4). Evaluating the integral in (4) requires high temporal resolution; we use particle locations observed every $3$ hours over $10$ years, for $10,000$ particles initially deployed uniformly across the domain. We adopt the method of Griesel et al. (2010) to evaluate the two diffusivities in each of the $16\times 16$ cells partitioning the domain. The position of each particle every $3$ hours is treated as a new independent starting point, to generate a set of particle trajectories each with time lag $\tau$ . The conditional averaging operator $\left<\cdot\right>_{\{\mathbf{X}_{i}\left(t\right)=\mathbf{x}\}}$ in (4) is then modified to include all particle trajectories that end in a given cell, and the time integral is computed using the trapezoidal rule. Note that, while this formally computes a diffusivity tensor, this tensor need not be symmetric positive definite (or even symmetric) and hence corresponding diffusivity ellipses cannot be shown without further processing. Projecting the diffusivity tensor onto directions parallel to and perpendicular to the Eulerian mean flow yields the along-stream and cross-stream diffusivities shown in figure 8.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abernathey et al. (2013) Abernathey, R., Ferreira, D., Klocker, A., 2013. Diagnostics of isopycnal mixing in a circumpolar channel. Ocean Modelling 72, 1–16.
2Alnæs et al. (2015) Alnæs, M. S., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M. E., Wells, G. N., 2015. The F Eni CS Project Version 1.5. Archive of Numerical Software 3 (100).
3Arakawa (1966) Arakawa, A., 1966. Computational design for long-term numerical integration of theequations of fluid motion: Two-dimensional incompressible flow. Part I. Journal of Computational Physics 1 (1), 119–143.
4Argo (2000) Argo, 2000. Argo float data and metadata from Global Data Assembly Centre (Argo GDAC). SEANOE.
5Bachman and Fox-Kemper (2013) Bachman, S., Fox-Kemper, B., 2013. Eddy parameterization challenge suite I: Eady spindown. Ocean Modelling 64, 12–28.
6Berloff et al. (2007) Berloff, P., Hogg, A. M. C., Dewar, W., 2007. The turbulent oscillator: A mechanism of low-frequency variability of the wind-driven ocean gyres. Journal of Physical Oceanography 37 (9), 2363–2386.
7Berloff and Mc Williams (2002) Berloff, P. S., Mc Williams, J. C., 2002. Material Transport in Oceanic Gyres. Part II: Hierarchy of Stochastic Models. Journal of Physical Oceanography 32 (3), 797–830.
8Brooks and Gelman (1998) Brooks, S. P., Gelman, A., 1998. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics 7 (4), 434–455.