Codifference can detect ergodicity breaking and non-Gaussianity

Jakub Slezak; Ralf Metzler; and Marcin Magdziarz

arXiv:1903.11905·cond-mat.stat-mech·July 25, 2019

Codifference can detect ergodicity breaking and non-Gaussianity

Jakub Slezak, Ralf Metzler, and Marcin Magdziarz

PDF

TL;DR

This paper demonstrates that the codifference is an effective measure for detecting ergodicity breaking and non-Gaussianity in stochastic time series, extending its applicability to complex models in physics, biology, and finance.

Contribution

It extends the use of codifference beyond stable processes to random-parameter and diffusing-diffusivity models, revealing dependence and ergodicity breaking not seen with traditional covariance analysis.

Findings

01

Codifference detects dependence and ergodicity breaking.

02

It applies to models in physics, biology, and finance.

03

It reveals non-Gaussian properties not visible through covariance.

Abstract

We show that the codifference is a useful tool in studying the ergodicity breaking and non-Gaussianity properties of stochastic time series. While the codifference is a measure of dependence that was previously studied mainly in the context of stable processes, we here extend its range of applicability to random-parameter and diffusing-diffusivity models which are important in contemporary physics, biology and financial engineering. We prove that the codifference detects forms of dependence and ergodicity breaking which are not visible from analysing the covariance and correlation functions. We also discuss a related measure of dispersion, which is a non-linear analogue of the mean squared displacement.

Tables1

Table 1. Table 1: Formulae for the codifference and the LCF corresponding to common models of D 𝐷 D : gamma, one-sided stable, Gaussian and uniform.

law of $D$	codifference $τ_{X}^{θ} (t)$	LCF $ζ_{X}^{θ} (t)$
$𝒢 (α, β)$	$\frac{α}{θ^{2}} \ln \frac{{(1 + θ^{2} / (2 β))}^{2}}{1 + θ^{2} (1 - r_{Y} (t)) / β}$	$\frac{2 α}{θ^{2}} \ln (\frac{θ^{2}}{2 β} δ_{Y}^{2} (t) + 1)$
$𝒮 (α, c)$	$c^{α} θ^{2 α - 2} (2^{1 - α} - {(1 - r_{Y} (t))}^{α})$	$2^{1 - α} c^{α} θ^{2 α - 2} {(δ_{Y}^{2} (t))}^{α}$
$𝒩 (μ, σ^{2})$	$μ r_{Y} (t) + \frac{{(θ σ)}^{2}}{2} {(1 - r_{Y} (t))}^{2} - μ - {(\frac{θ^{3}}{8} σ^{2} - \frac{θ}{2} μ)}^{2}$	$μ δ_{Y}^{2} (t) - \frac{{(θ σ)}^{2}}{4} {(δ_{Y}^{2} (t))}^{2}$
$𝒰 (a, b)$	$a r_{Y} (t) + \frac{1}{θ^{2}} \ln (\frac{θ^{2} (b - a)}{4 (1 - r_{Y} (t))} \frac{1 - ⅇ^{- θ^{2} (b - a) (1 - r_{Y} (t))}}{{(1 - ⅇ^{- θ^{2} (b - a) / 2})}^{2}})$	$a δ_{Y}^{2} (t) - \frac{2}{θ^{2}} \ln (\frac{2 (1 - ⅇ^{- θ^{2} (b - a) δ_{Y}^{2} (t) / 2})}{θ^{2} δ_{Y}^{2} (t) (b - a)})$

Equations288

μ_{X} (t) : = E [X_{t}], δ_{X}^{2} (t) : = E [(X_{t} - μ_{X} (t))^{2}], r_{X} (t) : = E [(X_{s + t} - μ_{X} (s + t)) (X_{s} - μ_{X} (s))] .

μ_{X} (t) : = E [X_{t}], δ_{X}^{2} (t) : = E [(X_{t} - μ_{X} (t))^{2}], r_{X} (t) : = E [(X_{s + t} - μ_{X} (s + t)) (X_{s} - μ_{X} (s))] .

τ_{X}^{θ} (t) : = \frac{1}{θ ^{2}} ln \frac{E [ \rme ^{i θ (X_{s + t} - X_{s})} ]}{E [ \rme ^{i θ X_{s + t}} ] E [ \rme ^{- i θ X_{s}} ]} .

τ_{X}^{θ} (t) : = \frac{1}{θ ^{2}} ln \frac{E [ \rme ^{i θ (X_{s + t} - X_{s})} ]}{E [ \rme ^{i θ X_{s + t}} ] E [ \rme ^{- i θ X_{s}} ]} .

E [\rme^{i θ X_{s + t}}] E [\rme^{- i θ X_{s}}] = E [\rme^{i θ X_{s}}]^{2} < 1

E [\rme^{i θ X_{s + t}}] E [\rme^{- i θ X_{s}}] = E [\rme^{i θ X_{s}}]^{2} < 1

E [\rme^{i θ (X_{s + t} - X_{s})}] = E [\rme^{i θ X_{s + t}}] E [\rme^{- i θ X_{s}}] .

E [\rme^{i θ (X_{s + t} - X_{s})}] = E [\rme^{i θ X_{s + t}}] E [\rme^{- i θ X_{s}}] .

τ_{X}^{θ} (t)

τ_{X}^{θ} (t)

= τ_{Y}^{θ} (t) + τ_{Z}^{θ} (t) .

θ \to 0 lim τ_{X}^{θ} (t) = r_{X} (t),

θ \to 0 lim τ_{X}^{θ} (t) = r_{X} (t),

τ_{X}^{θ} (t) = r_{X} (t),

τ_{X}^{θ} (t) = r_{X} (t),

τ_{X}^{θ} (t) : = \frac{1}{2 θ ^{2}} ln \frac{E [ \rme ^{i θ (X_{s + t} - X_{s})} ]}{E [ \rme ^{i θ (X_{s + t} + X_{s})} ]}

τ_{X}^{θ} (t) : = \frac{1}{2 θ ^{2}} ln \frac{E [ \rme ^{i θ (X_{s + t} - X_{s})} ]}{E [ \rme ^{i θ (X_{s + t} + X_{s})} ]}

ζ_{X}^{θ} (t) : = - \frac{2}{θ ^{2}} ln E [\rme^{i θ (X_{t} - μ_{X} (t))}] .

ζ_{X}^{θ} (t) : = - \frac{2}{θ ^{2}} ln E [\rme^{i θ (X_{t} - μ_{X} (t))}] .

ζ_{X}^{θ} (t) = ζ_{Y}^{θ} (t) + ζ_{Z}^{θ} (t) .

ζ_{X}^{θ} (t) = ζ_{Y}^{θ} (t) + ζ_{Z}^{θ} (t) .

ζ_{X}^{θ} (t) = δ_{X}^{2} (t) .

ζ_{X}^{θ} (t) = δ_{X}^{2} (t) .

c \to \infty lim ζ_{c X}^{θ} (t) = - \frac{2}{θ ^{2}} ln c \to \infty lim E [\rme^{i θ c X_{t}}] = - \frac{2}{θ ^{2}} ln 0^{+} = \infty.

c \to \infty lim ζ_{c X}^{θ} (t) = - \frac{2}{θ ^{2}} ln c \to \infty lim E [\rme^{i θ c X_{t}}] = - \frac{2}{θ ^{2}} ln 0^{+} = \infty.

ζ_{X}^{θ} (t) = \frac{1 - E [ \rme ^{i θ J} ]}{2 θ ^{2} E [ T ]} t,

ζ_{X}^{θ} (t) = \frac{1 - E [ \rme ^{i θ J} ]}{2 θ ^{2} E [ T ]} t,

δ_{X}^{2} (t) = \frac{E [ J ]}{E [ T ]} t,

δ_{X}^{2} (t) = \frac{E [ J ]}{E [ T ]} t,

ζ_{L_{α}^{H}}^{θ} (t) = C_{θ} t^{α H},

ζ_{L_{α}^{H}}^{θ} (t) = C_{θ} t^{α H},

ζ_{B (S_{α})}^{θ} (t) = - \frac{2}{θ ^{2}} ln E_{α} (- \frac{θ ^{2}}{2} t^{α}),

ζ_{B (S_{α})}^{θ} (t) = - \frac{2}{θ ^{2}} ln E_{α} (- \frac{θ ^{2}}{2} t^{α}),

B_{2 H, β} (t) = D_{β} B_{H} (t) .

B_{2 H, β} (t) = D_{β} B_{H} (t) .

ζ_{B_{2 H, β}}^{θ} (t) = - \frac{2}{θ ^{2}} ln E_{β} (- \frac{θ ^{2}}{2} t^{2 H}) \sim \frac{1}{Γ ( β + 1 )} t^{2 H}, t \to 0^{+},

ζ_{B_{2 H, β}}^{θ} (t) = - \frac{2}{θ ^{2}} ln E_{β} (- \frac{θ ^{2}}{2} t^{2 H}) \sim \frac{1}{Γ ( β + 1 )} t^{2 H}, t \to 0^{+},

ζ_{B_{2 H, β}}^{θ} (t) = \frac{4 H}{θ ^{2}} ln t + \frac{2}{θ ^{2}} ln (\frac{θ ^{2} Γ ( 1 - β )}{2}) + o (1), β \neq = 1, t \to \infty.

ζ_{B_{2 H, β}}^{θ} (t) = \frac{4 H}{θ ^{2}} ln t + \frac{2}{θ ^{2}} ln (\frac{θ ^{2} Γ ( 1 - β )}{2}) + o (1), β \neq = 1, t \to \infty.

τ_{Δ B_{2 H, β}}^{θ} (t) = \frac{1}{θ ^{2}} ln \frac{E _{β} ( - θ ^{2} ( Δ t ^{2 H} - ( ∣ t + Δ t ∣ ^{2 H} + ∣ t - Δ t ∣ ^{2 H} ) /2 ) )}{E _{β} ( - θ ^{2} Δ t ^{2 H} /2 )} .

τ_{Δ B_{2 H, β}}^{θ} (t) = \frac{1}{θ ^{2}} ln \frac{E _{β} ( - θ ^{2} ( Δ t ^{2 H} - ( ∣ t + Δ t ∣ ^{2 H} + ∣ t - Δ t ∣ ^{2 H} ) /2 ) )}{E _{β} ( - θ ^{2} Δ t ^{2 H} /2 )} .

τ_{Δ B_{2 H, β}}^{θ} (\infty) = \frac{1}{θ ^{2}} ln \frac{E _{β} ( - θ ^{2} Δ t ^{2 H} )}{E _{β} ( - θ ^{2} Δ t ^{2 H} /2 )} .

τ_{Δ B_{2 H, β}}^{θ} (\infty) = \frac{1}{θ ^{2}} ln \frac{E _{β} ( - θ ^{2} Δ t ^{2 H} )}{E _{β} ( - θ ^{2} Δ t ^{2 H} /2 )} .

τ_{X}^{θ} (\infty) = \frac{1}{θ ^{2}} ln \frac{E [ \rme ^{- θ^{2} D} ]}{E [ \rme ^{- θ^{2} D /2} ] ^{2}} \geq 0,

τ_{X}^{θ} (\infty) = \frac{1}{θ ^{2}} ln \frac{E [ \rme ^{- θ^{2} D} ]}{E [ \rme ^{- θ^{2} D /2} ] ^{2}} \geq 0,

X_{t} = D Y_{t} .

X_{t} = D Y_{t} .

r_{X} (t) = E [D] r_{Y} (t),

r_{X} (t) = E [D] r_{Y} (t),

τ_{X}^{θ} (t) = \frac{1}{θ ^{2}} ln \frac{E [ \rme ^{- θ^{2} D (1 - r_{Y} (t))} ]}{E [ \rme ^{- θ^{2} D /2} ] ^{2}}

τ_{X}^{θ} (t) = \frac{1}{θ ^{2}} ln \frac{E [ \rme ^{- θ^{2} D (1 - r_{Y} (t))} ]}{E [ \rme ^{- θ^{2} D /2} ] ^{2}}

ζ_{X}^{θ} (t) = - \frac{2}{θ ^{2}} ln E [\rme^{- θ^{2} D δ_{Y}^{2} (t) /2}] .

ζ_{X}^{θ} (t) = - \frac{2}{θ ^{2}} ln E [\rme^{- θ^{2} D δ_{Y}^{2} (t) /2}] .

τ_{X}^{θ} (t) - τ_{X}^{θ} (\infty) \sim \frac{E [ D \rme ^{- θ^{2} D} ]}{E [ \rme ^{- θ^{2} D} ]} r_{Y} (t), t \to \infty,

τ_{X}^{θ} (t) - τ_{X}^{θ} (\infty) \sim \frac{E [ D \rme ^{- θ^{2} D} ]}{E [ \rme ^{- θ^{2} D} ]} r_{Y} (t), t \to \infty,

τ_{X}^{θ} (t) = τ_{X^{'}}^{θ} (t) + τ_{X^{''}}^{θ} (t),

τ_{X}^{θ} (t) = τ_{X^{'}}^{θ} (t) + τ_{X^{''}}^{θ} (t),

r_{X} (0∣ C) = T \to \infty lim \frac{1}{T} \int_{0}^{T} d t X (t)^{2} .

r_{X} (0∣ C) = T \to \infty lim \frac{1}{T} \int_{0}^{T} d t X (t)^{2} .

τ_{X}^{θ} (t) = \frac{1}{θ ^{2}} ln E [\rme^{(θ σ)^{2} \rme^{- t Λ}}],

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Codifference can detect ergodicity breaking and non-Gaussianity

Jakub Ślęzak*†‡, Ralf Metzler♯, and Marcin Magdziarz‡*

†Department of Physics, Bar Ilan University

*‡*Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology

♯Institute of Physics and Astronomy, Potsdam University

[email protected]. Corresponding author: Ralf Metzler

Abstract

We show that the codifference is a useful tool in studying the ergodicity breaking and non-Gaussianity properties of stochastic time series. While the codifference is a measure of dependence that was previously studied mainly in the context of stable processes, we here extend its range of applicability to random-parameter and diffusing-diffusivity models which are important in contemporary physics, biology and financial engineering. We prove that the codifference detects forms of dependence and ergodicity breaking which are not visible from analysing the covariance and correlation functions. We also discuss a related measure of dispersion, which is a non-linear analogue of the mean squared displacement.

A Introduction

A.1 Statistical measures in modelling of diffusion

The analysis of stochastic systems has three important and partially distinct aspects: models, properties and estimation. These roughly correspond to physical, mathematical and statistical aspects of research. Modelling is concerned with explaining the nature of a system according to the underlying theory (e.g. "the particle undergoes Brownian motion, because it rapidly exchanges momenta with the molecules of liquid"). The analysis of statistical properties (also called "measures")111Strictly speaking, these are sufficiently regular functionals acting on the space of observations. In quantum mechanics each such linear functional corresponds to an observable. In statistical mechanics a similar rôle is fulfilled by $\mathbb{E}[f(X)]$ for bounded continuous functions $f$ . In statistics linearity is usually not required and various measures have the form $g(\mathbb{E}[f(X)])$ . This is the case also in the present work. relates these models with observable quantities ("Brownian motion has a linear mean squared displacement"). By using suitable estimators we link these parameters to the experimental data ("the mean squared displacement can be efficiently estimated by an arithmetic average over squared displacements").

This work is motivated by our conviction that the choice of statistical measures is too small for contemporary needs, as the scope and number of models increased considerably [1]. The classical models based on the Langevin equation [2], the generalised Langevin equation [3, 4], as; well as short- [5] and long- [6, 7] memory random walks were complemented by motions on fractals [8], motions in complex energy landscapes [9], random walks in random environments [10, 11], random walks with correlated steps and waiting times [12, 13, 14, 15, 16] and Lévy walks [17], spatially heterogeneous diffusion processes [18], diffusing-diffusivity [19] and more. Distinguishing between different models from this wide class is of course crucially dependent on the physical understanding of the system, but this requirement does not lessen the importance of empirical verification based on various measures and corresponding estimators. From an experimental point of view the large range of different stochastic processes is called for by ever more detailed insights garnered in highly complex environments such as living biological cells or membranes, for instance, by single particle tracking of individual sub-micron tracers of even fluorescently labelled single molecules [20, 21, 22].

Traditionally, in the study of diffusion phenomena, the three most basic and popular statistical measures in use are: the mean as a measure of location, the mean squared displacement (MSD) as a measure of dispersion and the covariance as a measure of dependence, respectively

[TABLE]

Other, alternative choices of measures could be, for example: the median for the location [23], entropy [24] or quantile ranges [23] for the dispersion, the rank correlation [23, 25] or the mutual information [26] for the dependence.

The covariance as defined above should not depend on the choice of $s$ , which is true for stationary processes (the term "non-ageing" is also in use). We will assume stationarity whenever we will be studying memory. In practical applications this condition is fulfilled by many types of confined motions or increments of free diffusions. The more general non-stationary case will be only briefly mentioned in Eqs. (C2) and (C4). Many of the arguments presented here could be further extended to non-stationary models, but it would require a case-by-case study. Conversely, measures of dispersion and location are interesting mostly for non-stationary (ageing) processes, otherwise they are constant, and the discussed cases will fit into that category.

The present range of typically employed measures, which could be effectively used for studying diffusion is indeed quite limited, and the need of a wider range of methods has been acknowledged for many years. Various papers proposed, e.g. studying higher order moments and ratios of moments [27], running maximum [28], p-variation [29], or time averages and ensemble averages of time averages [30]. A prominent example of the last kind of measure is, e.g., the ergodicity breaking parameter [18, 31, 32, 33]. Recently also single-trajectory power spectral methods were proposed [34, 35]. These techniques are steadily gaining public recognition, but often the range of their application is still narrow. Moreover, a large part of this important research has a limitation of studying properties "not very different" from the second order on. For example, any power function $x^{\alpha}$ for $\alpha>1$ has a similar behaviour to $x^{2}$ (i.e., it is an increasing, convex function) and parameters based on it are usually not far away from the classical ones.222This similarity is what causes the ”strong anomalous diffusion” property, for which the power-law dependency $\mathbb{E}|X_{t}|^{q}\propto t^{qv(q)}$ is observed for non-constant function $v$ [36]. They all emphasise highly the tails of the distribution, and any change of distributions for large values of observations has a larger influence than for the small ones. This connection is very helpful in making comparisons, but the important part of the total information is lost and could be extracted using more distinct measures.

A.2 Overview of the codifference

Our main subject of interest, the codifference, is an example for a measure different from those based on moments. It was initially proposed as a tool to measure the dependence for $\alpha$ -stable processes, for which the second moment is infinite [37, 38, 39, 40, 41, 42]. However, in many systems the divergence of the second moment is not an expected physical property, which limits the range of possible applications of stable processes. It was already noticed, e.g., in [43] that the codifference may be useful for both models with or without finite second moment. In our present work we study the applications of the codifference for a class of models based on Gaussian distributions, which we call conditionally Gaussian processes; as we will demonstrate many useful and widely used models fit into this category.

The definition of the codifference which we will use is as follows: for any stationary process $X$ it is given by the formula

[TABLE]

The sample codifference is introduced in a standard way, by replacing the three ensemble averages $\mathbb{E}[\boldsymbol{\cdot}]$ in the above expression by arithmetic averages $\frac{1}{n}\sum_{j=1}^{n}(\boldsymbol{\cdot})$ . Similarly, one can consider a time-averaged codifference. For all symmetric distributions the considered averages should be real-valued, so in most of the practical applications one can average over $\cos(\theta(\boldsymbol{\cdot}))$ instead of $\exp(\mathrm{i}\theta(\boldsymbol{\cdot}))$ ; this was used for the Monte Carlo simulations which will be presented further on.

Note that the so-called generalised codifference has $X_{s+t}$ and $X_{s}$ multiplied by $\theta_{1}$ and $\theta_{2}$ respectively and contains even more information [37]. In the context of models that we will consider this additional flexibility does not seem to be meaningful and so the cost of complicating our formulae would be unreasonable.

Conversely, the basic formula for the codifference in the classical book of Samorodnitsky and Taqqu [37] is similar to ours, but with $\theta=1$ . In the mathematical study of stable process this is sufficient, but in more broad physical applications introducing an arbitrary dimensional constant equal to unity is not desirable. In our choice of definition the codifference has the unit of $X^{2}$ due to the introduction of $1/\theta^{2}$ . This factor makes the codifference comparable to the covariance, and allows us to show them on the same plots. When this is not important the factor $1/\theta^{2}$ can be omitted. There exists an even more simplified object, the dynamical functional [44], which is just the numerator minus the denominator from (A2) with $\theta=1$ ; it is used to study ergodicity breaking [30, 45].

Instead of moments such as the covariance, the codifference depends on sines and cosines of $\theta X_{s+t}$ and $\theta X_{s}$ . Expanding these functions into Taylor series around zero up to the two first terms and using the fact that for stationary process $\mathbb{E}\left[X_{t}\right]=\mathrm{const.}$ shows that the codifference agrees with the covariance for distributions concentrated around the origin. The most essential difference is that the codifference measures mainly the dependence determined by the bulk of the probability density in contrast to the covariance, which puts much larger emphasis on the tails. This is caused by the cancellation of highly oscillatory terms in the tails of the PDF as stated by the Riemann-Lebesgue lemma, which is in contrast to the huge influence of the tails in the covariance caused by the quadratic factor in the probabilistic integral $\mathbb{E}[X_{s}X_{s+t}]$ .

Because of the presence of two highly non-linear transformations: sine/cosine and logarithm, definition (A2) may initially not seem very intuitive. It becomes more natural if we interpret it as a conveniently transformed Fourier transform of the distribution (that is, the probabilistic characteristic function). In the full, multidimensional form, the characteristic function contains all information about the dependence. Moreover for Gaussian variables it has the very simple form $\exp(-(\theta\sigma)^{2}/2)$ , so it seems reasonable to use it as a dependence measure for models related to the Gaussian distribution. Still, it is not obvious that the codifference behaves as we would require from a memory function. Fortunately, simple arguments show that this is the case:

a)

When $X_{s+t}=X_{s}$ (the case of total positive dependence) the codifference is a positive constant $\tau_{X}^{\theta}(t)=\tau_{X}^{\theta}(0)>0$ . If the values $X_{s+t}$ and $X_{s}$ become independent, the codifference converges to 0. Both facts are immediate consequences of the definition together with

[TABLE]

and, for $X_{s+t}$ independent of $X_{s}$ ,

[TABLE]

b)

If the process is a sum of independent components $X_{t}=Y_{t}+Z_{t}$ then the respective codifferences are additive

[TABLE]

This property is important in common applications, where the observed process usually is at least to some degree disturbed by noise, which can most often be assumed to be additive and independent of the basic motion.

c)

If $\mathbb{E}[X_{t}^{2}]<\infty$ , the covariance can be viewed as a limit of the codifference,

[TABLE]

which stems from expanding the complex exponents in definition (A2) into a Taylor series up to the second term and noting that we obtained the logarithm of expression $(1+\theta^{2}r_{X}(t)+o(\theta^{2}))^{\theta^{-2}}$ . It is then justified to treat the codifference as a generalisation of the covariance.

d)

For a Gaussian process the codifference equals the covariance for any $\theta$

[TABLE]

which follows immediately from a short calculation, see Eq. (C6). Therefore comparing the codifference and the covariance can be used to measure non-Gaussianity.

One intuitive property, that the codifference does not have, is symmetry. Considering two variables we fix the first one and negate the second one ( $x\mapsto-x$ ), and we expect the strength of dependence to be the same but for the sign to change. This is the case for the covariance, but not for the codifference, which is by design non-linear. Even in the borderline case $X_{s+t}=-X_{s}$ we do not have a guarantee that $\tau_{X}^{\theta}(t)<0$ , counterexamples can be given even for the otherwise well-behaved class of processes considered later. It is actually possible to remove this sometimes inconvenient property by introducing the symmetrised codifference

[TABLE]

which for all symmetric distributions changes sign with respect to reflection, $X_{s+t}\mapsto-X_{s+t}$ . This quantity can be useful if one wants to compare the strength of positive and negative dependencies, but there is a cost: the symmetrised codifference is "linear enough" to ignore many types of non-linear ergodicity breaking, similarly to the covariance, see Eq. (C.2). For this reason further on we will use the non-symmetrised codifference and study systems with a positive type of dependence, at least in some suitable limit, such as $t\to\infty$ .

Note that if the codifference is a generalisation of the covariance, one should reasonably expect that there exists a generalisation of the MSD defined in a similar spirit. Indeed, let us consider the formula

[TABLE]

This quantity may seem trivial, because studying the distribution in Fourier space is a classical method of basic probability theory. But, the distinguishing part of this definition is that the result is treated primarily as a function of time and it is conveniently transformed, so that it can be interpreted as a measure of dispersion with the same unit as $X^{2}$ . Up to a rescaling it can be considered a cumulant generating function calculated at imaginary argument, but such a quantity does not seem to have an established name in the literature, so we will call it by the straightforward term "log characteristic function", in short LCF. It is clear that in analogy to the features of the codifference, the LCF measures mainly the spread of the bulk of the probability and is much less influenced by the distribution’s tails than the MSD. As before, the first factor, here $2/\theta^{2}$ , is optional and only needed when one wants to compare the LCF to the MSD.

The LCF is indeed a reasonable measure of dispersion, as shown by the following properties:

a)

For independent $Y_{t},Z_{t}$ and $X_{t}=Y_{t}+Z_{t}$ ,

[TABLE]

b)

For any Gaussian process the LCF equals the MSD,

[TABLE]

c)

As we stretch the probability density of $X_{t}$ , the LCF diverges, that is,

[TABLE]

The first two facts are analogues of the corresponding properties of the codifference which allow one to trace the influence of the noise and detect non-Gaussianity. The point c) is just the Riemann-Lebesgue lemma in disguise: it corresponds to the intuition that the rescaled process should have a larger spread. It should be mentioned that in general the LCF can be negative or complex valued, which is highly undesirable. However, for the considered models, which are based on internal Gaussian dynamics, this will never be the case, as proved in Proposition 2.

Decomposing any process with independent increments into a sum of its jumps shows that in this case $\zeta_{X}^{\theta}(t)$ is a linear function. In particular, this holds of Lévy flights [37]. It also holds for continuous time random walks with exponential waiting times [5], for which

[TABLE]

where $J$ is one jump and $T$ is one waiting time of diffusion $X$ . The dependence on $T$ is the same as for the MSD,

[TABLE]

only the scaling depending on $J$ ’s distribution changes from non-linear to linear.

The LCF can also be used for finite- or infinite-variance models which are "anomalous" in some sense. A basic example is fractional Lévy stable motion $L_{\alpha}^{H}$ [46]. It is stable and self-similar which implies that

[TABLE]

for some constant $C_{\theta}$ , which depends on the chosen normalisation. This formula agrees with the intuition that a measure of the spread in this case should behave like a power law. Somewhat surprisingly, the situation is different for continuous time random walks with power-law waiting times, which are used to model subdiffusion. Such processes after rescaling converge to subordinated Brownian motion $B(S_{\alpha}(t))$ , for which the LCF can be calculated directly, using the well-known properties of the inverse $\alpha$ -stable subordinator $S_{\alpha}$ [47],

[TABLE]

where $E_{\alpha}$ is the Mittag-Leffler function [48]. This function approaches infinity like a logarithm; the exact asymptotic is shown in Eq. (B3). The difference between these two models of anomalous diffusion is that $L_{\alpha}^{H}$ is self-similar, so its PDF spreads in the uniform manner, whereas for $B(S_{\alpha})$ the bulk is much more constrained than the tails.

After this brief discussion about the general properties of the codifference and related notions, we will study its behaviour in more detail for models based on random parameters of motion and for models based on random and time-varying diffusion coefficient. The next section (B) provides a general physical overview and concrete examples useful for the modelling. The third and the last section (C) is dedicated to presenting mathematical results and calculation techniques. The paper is written such that, if the reader prefers, the physical and mathematical sections B and C can be read independently.

B Modelling

B.1 Gaussian diffusion governed by random parameters

One of the core concepts behind ergodicity and ergodicity breaking is the idea of looking at information contained in a single trajectory. We speak about ergodicity if the data that can possibly be gained analysing one, sufficiently long, series of observations, is the same as if one analyses all possible trajectories in the ensemble [49]. Conversely, if this amount of information is smaller, we speak about ergodicity breaking. In other words, there is some information contained in a given trajectory, and using only a single trajectory we omit the amount contained in the rest. This is sometimes also rephrased as confinement in the phase space, but this language must be used carefully as the said space has a subtle structure.333Even for classical Brownian motion it is the infinitely dimensional Wiener space [50]

From a different perspective, modelling based on the information content often leads to an intuitive description, because the differences between trajectories often stem from differences between diffusing particles and differences between their local surroundings. Both may occur, e.g., in biological systems. The latter case requires the additional assumption that the inhomogeneity present in the surroundings varies on a length scale of the mean distance between trajectories, but does not vary much at the scale of the trajectories themselves. That is, distinct trajectories have distinct surroundings, but each particle is sufficiently localised so that the state of the medium around it does note change significantly. This is reasonable for example when the particles are trapped or the measurement time is sufficiently short—compare, e.g., the absolute spread of the traced particles in [33].

In any case, this information can be parametrised, which leads to the so-called hierarchical or multilevel modelling [51], which in the context of physics is also called "superstatistics" (a short term for "superposition of statistics") [52]. Deterministic parameters of the basic model become random on an additional statistical layer.

B.1.1 Random diffusion coefficient.

For diffusion the simplest example of an hierarchical model is the motion with a random diffusion coefficient, the situation when different trajectories depict movements with varying average mobilities. A typical model of such observations is the grey Brownian motion [53, 54, 55]

[TABLE]

Here $B_{H}$ is fractional Brownian motion [56] and the diffusion coefficient $D_{\beta}$ is an independent random variable with the so-called $\beta$ M-Wright distribution [57]. The moments of grey Brownian motion are the same as those of fractional Brownian motion up to a multiplicative constant, therefore the MSD still grows as $t^{2H}$ and the process models anomalous diffusion. Nevertheless, a straightforward calculation yields that the LCF can be expressed using the Mittag-Leffler function,

[TABLE]

which also yields

[TABLE]

Here the asymptotic ‘ $+o(1)$ ’ is pointwise, which is stronger than the asymptotic proportionality ‘ $\sim$ ’; in the sense of ‘ $\sim$ ’ the term $4H/\theta^{2}\ln t$ is dominating and the logarithmic behaviour clearly distinguishes the LCF from the power-law MSD at long times. This crossover behaviour can be used to distinguish grey Brownian motion from fractional Brownian motion (case $\beta=1$ [53]) and diffusing-diffusivity model (Eqs. (B27) and (B28)). The very slow log increase of the LCF is not surprising: because the diffusion constant is random, but fixed and it constrains the relaxation of the probability density—it is detected by the LCF, but ignored by the MSD; for a more general result see Proposition 7 d).

Grey Brownian motion models free, unconfined movements and is therefore not stationary. Still, the codifference can be used for its increments $\Delta B_{2H,\beta}(t)\mathrel{\mathop{:}}=B_{2H,\beta}(t+\Delta t)-B_{2H,\beta}(t)$ . The calculation is again not hard and yields

[TABLE]

The covariance decays to zero like a power law $t^{2H-1}$ , but the function above decays to the non-zero constant

[TABLE]

This means that there is some degree of dependence left even at $t=\infty$ which the covariance does not detect, but the codifference does. Indeed, it can be interpreted as a joint dependency on the trajectory-wise fixed but random diffusion coefficient $D_{\beta}$ .

The above simple example shows that the codifference does not directly detect non-ergodicity, it rather detects dependence. The notion of mixing is useful to describe this idea. It is a property which states that the future evolution of the process after a long delay becomes independent of its past values. Formally speaking, the process is mixing when, if we calculate some statistic in some finite time interval starting at $s$ , and later on any other statistic starting at $s+t$ , these two must become independent as $t\to\infty$ [58]. Therefore, analysing the codifference, which measures the dependence between $\exp(-\mathrm{i}\theta X_{s})$ and $\exp(\mathrm{i}\theta X_{s+t})$ , allows one to exclude mixing, i.e. to indicate the presence of a non-vanishing dependence. The latter means that the motion is constrained in phase space, which in turn implies ergodicity breaking.444The remaining class of processes which are ergodic but non-mixing is complicated and those do not seem to appear in applications. For a mathematically constructed example of such a process and the discussion see [59].

Thus, for a very large class of systems one does not need to study time-averages to detect non-ergodicity. It is sufficient to find a proper memory function which will indicate non-mixing. As we demonstrate the covariance fails in this role for the considered models, but the codifference works.

These detecting capabilities of the codifference work under quite general circumstances. If we observe any ensemble of mixing, zero mean Gaussian trajectories, the covariance will converge to zero. This happens because for Gaussian process, mixing is equivalent to a decay of the covariance [58, 59], and the mixture of decaying covariance functions is decaying. But, the ensemble of trajectories as a whole will not be ergodic, which will not be detected by the covariance. Let $\mathcal{C}$ is some parametrisation of this mixture, then the conditional average $D=\mathbb{E}[X_{t}^{2}|\mathcal{C}]$ be the resulting, possibly random, conditional variance. We call it $D$ because if the data $X$ corresponds to the velocity or increments of displacements, it will be proportional to the diffusion coefficient. Under these assumptions the codifference converges to the constant

[TABLE]

as proven in Proposition 5. This quantity is related to the coefficient of variation defined as the standard deviation divided by the mean [23]. Denoting it by $\mathrm{CV}[X]$ , the formula above can be expressed as $\theta^{-2}\ln(\mathrm{CV}[\exp(-\theta^{2}D/2)]^{2}+1)$ which is an increasing function of $\mathrm{CV}[\exp(-\theta^{2}D/2)]$ and asymptotically quadratic for small $\mathrm{CV}$ . The coefficient of variation is a measure of dispersion, hence so is $\tau_{X}^{\theta}(\infty)$ which reflects the randomness of $D$ . This behaviour is also equivalent to detecting a residual dependence and the resulting non-mixing/non-ergodicity.

Outside of the useful limit $t=\infty$ not much can be said about the properties of the codifference in such a wide and general class. The situation changes if we consider a more specific model. The idea behind grey Brownian motion and many works about superstatistics [52] is that the trajectories differ mainly by the diffusion coefficient, other properties are not significantly distinct. A simple model of such a system can be written as

[TABLE]

We assume that the process $Y$ describes the joint form of dependence common for all trajectories. We consider a Gaussian $Y$ , which for grey Brownian motion would be fractional Brownian motion. Another reasonable choice would be, e.g., a solution of the Langevin equation. In this case, as long as $Y$ is stationary (i.e., for free diffusion we consider increments or the velocity process), the covariance is

[TABLE]

of course as long as $\mathbb{E}[D]<\infty$ . If the process $Y$ has sufficiently long memory, $r_{Y}(t)\approx 0$ in the considered time scale, also $r_{X}(t)\approx 0$ . The covariance does not detect the additional dependence introduced by random $D$ .

At the same time the codifference can be expressed as a function of the covariance of $Y$ , precisely as

[TABLE]

for any $D$ , no matter if $\mathbb{E}[D]<\infty$ . It clearly converges to the constant (B6) as $r_{Y}(t)\to 0$ and detects the additional non-linear dependence.

For a general, possibly non-stationary $Y$ with $\mathbb{E}[Y_{t}^{2}]=\delta_{Y}^{2}(t)$ , the representation of the LCF is

[TABLE]

Given some model of $D$ these formulae can be made completely explicit, examples are given in Table 1. The first example is the gamma distribution $D\overset{d}{=}\mathcal{G}(\alpha,\beta)$ in which the coefficient $\alpha$ describes the power-law behaviour of the PDF near 0 and $\beta$ is the rate of exponential decay of the tails (the specific case $\mathcal{G}(1,\beta)$ is the exponential distribution); it models common types of experiments in which the distribution of diffusion coefficients resembles a bump concentrated around some finite constant and high values of $D$ become exponentially less probable. This case is also illustrated in Figure 1.

Diffusion coefficients with a heavy-tailed distribution result in a motion that itself exhibits heavy tails of the PDF, a phenomenon actively investigated in transport, finance, turbulence and many other systems [6, 60, 61]. A classical model of this case is the one-sided $\alpha$ -stable subordinator $\mathcal{S}(\alpha,c)$ , determined by its Laplace transform $\exp(-(cs)^{\alpha})$ . The resulting type of process was thoroughly studied in the literature concerned with stable distributions [37]. This process is called sub-Gaussian, which is arguably a confusing term. In this case the process $X$ has no second moment, therefore attempts to estimate its covariance will lead to a diverging result. This is visible in the formulae for the codifference and the LCF, which diverge as $\theta\to 0$ . But, for any $\theta>0$ the codifference and the LCF are finite and can be estimated in a standard way, and from the result if one wishes the covariance and the MSD of $Y$ can be reconstructed.

For a distribution concentrated around its mean value one can use Gaussian $\mathcal{N}(\mu,\sigma^{2})$ or uniform $\mathcal{U}(a,b)$ distributions, however the former is only a valid model for $\sigma\ll\mu$ , when the probability that $D<0$ can be neglected.

Even if the precise model of $D$ is not known, quite a lot can be said about the behaviour of the codifference. In Proposition 6 we show that

a)

The codifference is a monotonic function of the covariance. If one increases, the second one also increases, the same goes for decreases.

b)

If $\mathbb{E}[D]<\infty$ the codifference is smaller than the covariance for strong positive correlation, but larger for weak or negative correlations.

c)

The approach to the value $\tau_{X}^{\theta}(\infty)$ has the same asymptotic as the decay of the covariance

[TABLE]

assuming $r_{Y}(t)\to 0$ , which is a typical case.

These are all desirable properties: the memory structure of the internal process $Y$ is reflected in a straightforward manner by the codifference. For small values of the covariance their relation is even linear, as stated in c), and the proportionality constant is finite for any distribution of $D$ , due to the truncating factor $\exp(-\theta^{2}D)$ .

Another property is that the codifference depends additively on $D$ . Precisely speaking, if we decompose $D=D^{\prime}+D^{\prime\prime}$ for some independent $D^{\prime}$ and $D^{\prime\prime}$ , the codifference also decomposes for

[TABLE]

where $X^{\prime}$ and $X^{\prime\prime}$ are processes with diffusion coefficients $D^{\prime}$ and $D^{\prime}$ respectively. Therefore subtracting the codifferences estimated from different samples may be used to analyse different sources of diffusivity. The derivation is given in Proposition 6.

Analogous features can also be checked for the LCF (Proposition 7), which can also be decomposed for $D=D^{\prime}+D^{\prime\prime}$ and is a monotonic function of the MSD, but is always smaller than the MSD, therefore detecting the additional constraints of the motion introduced by a random $D$ .

At the end of the discussion about random diffusion coefficients we note that the behaviour of the codifference near $t=0$ can also give valuable information. In Proposition 8 we prove that for a typical case when $\mathbb{E}[D]<\infty$ its asymptotic reflects that of the covariance. However, if $\mathbb{E}[D]=\infty$ and $D$ has power tails, corresponding to the presence of high-volatility trajectories, the asymptotic of the codifference has an additional power law. As for Gaussian processes the behaviour of the covariance near $t=0$ is determined by their fractal dimension [62, chapter 8.8], the same is true for the codifference, which can be applied also for processes with no moments.

B.1.2 Random memory decay rate.

Another interesting type of models are ensembles of particles for which the time dependence may vary from trajectory to trajectory. The simplest model of a time-varying dependency is the exponential decay $\exp(-t\Lambda)$ , which is the covariance of Ornstein-Uhlenbeck process [63]. It models many kinds of linear relaxation disturbed by additive noise. It was also studied as a model of the additive measurement noise itself [64, 65]. In the hierarchical model the decay rate $\Lambda$ may be random. The covariance of the resulting mixture of Ornstein-Uhlenbeck type trajectories was studied in [66] in the context of a randomly parametrised Langevin equation.

The coefficient $\Lambda$ has a different physical interpretation depending on the details of the studied phenomenon. For the velocity of a Brownian particle it is proportional to the friction coefficient and its randomness is related to local changes of the viscosity and/or different shapes of the diffusing particles [67]; in this system the fluctuation-dissipation relation also links the scaling to the temperature. For trapped particles $\Lambda$ is proportional to the stiffness of the confining harmonic potential (the prominent example being optical tweezers [21, 68]), therefore the randomness of $\Lambda$ is equivalent to an ensemble of traps with varying sizes, which are proportional to $\Lambda^{-1}$ .

Another case worth mentioning is that of viscoelastic anomalous diffusion [69], for which the velocity (or increments) have power-law dependence $\propto t^{2H-1}$ . This function can be expressed as $\exp(-\ln(t)(1-2H))$ . Therefore it is enough to replace $t$ with $\ln t$ and the results further on will also follow for the ensemble of power-law memory trajectories characterised by random parameter $(1-2H)$ . It is worth to note that the variability of the of the Hurst index $H$ seems to be more of a rule than an exception for biological systems [70, 71, 72].

We do not want to make the discussion overly technical, so below we will analyse only the case of deterministic scaling and random decay rate, $r_{X}(t|\mathcal{C})=\sigma^{2}\exp(-t\Lambda)$ . Results for more general $Df(\Lambda)\exp(-t\Lambda)$ are presented in Propositions 10, 11 and 12, which prove that the randomness of the scaling is not essential for most of the properties discussed below. We also note that sometimes one can remove the random scaling and normalise the trajectories using the estimate of scaling obtained from the Birkhoff ergodic theorem [58],

[TABLE]

However, this procedure requires having access to sufficiently long trajectories.

A particular property of ensembles with fixed scaling is that any marginal distribution is Gaussian, i.e., all variables $X_{t}$ have Gaussian distribution with variance $\sigma^{2}$ . But the codifference can be found to be

[TABLE]

and because it does not equal the covariance, the process as a whole is not Gaussian. The codifference indicates the presence of subtle non-Gaussianity of the memory structure. This formula can also be used to derive useful bounds between the codifference and the covariance, see Proposition 9.

Expanding in a Taylor series the exponent from (B14) leads to

[TABLE]

Note that $\sigma^{2}\mathbb{E}[\rme^{-kt\Lambda}]=r_{X}(kt)$ , so the result is a type of average over the values $r_{X}(kt)$ . When the distribution of $\Lambda$ is not sufficiently concentrated near [math] and the covariance decays fast (strictly speaking is rapidly varying [73, 74]), the term $k=1$ dominates the $t\to\infty$ asymptotic. This is the case, e.g., for the one-sided stable variable $\Lambda\overset{d}{=}\mathcal{S}(\alpha,c)$ for which

[TABLE]

that is, we observe a stretched exponential type of dependence.

When $\Lambda$ is more concentrated around 0 the situation differs. A basic example would again be the gamma distribution $\Lambda\overset{d}{=}\mathcal{G}(\alpha,\beta)$ , for which

[TABLE]

When $\alpha=1$ (i.e., $\Lambda$ has an exponential distribution) the above can also be written using the incomplete gamma function. For any $\alpha$ all terms in the sum decay like $t^{-\alpha}$ and they are comparable. Because of this, the codifference also decays with the same power law, but the proportionality constant is non-trivial ,

[TABLE]

It is not surprising that this behaviour is not specific to a gamma distribution and can be observed for any $\Lambda$ with power-law PDF near $0^{+}$ , see Proposition 10. Similarly, if the PDF of $\Lambda$ decays fast near $0^{+}$ , the codifference also decays fast. All these properties are analogous to those of the covariance [66], so here they can be used interchangeably or simultaneously, as a mean to obtain stronger statistical verification.

They are also similar in that both do not detect the non-ergodicity, more precisely the non-mixing, of this system. As was already demonstrated for the covariance it is a common occurrence resulting from its linearity. The codifference fails, because it does measure only a reduced form of mixing. For the process to be mixing it means that any two sets of multiple disjoint measurements must become asymptotically independent, i.e., the vectors $[X_{s_{1}},X_{s_{2}},\ldots,X_{s_{n}}]$ and $[X_{s_{1}+t},X_{s_{2}+t},\ldots,X_{s_{n}+t}]$ have to become independent as $t\to\infty$ . The codifference (and for that matter also the covariance) measures only the dependence between two values $X_{s}$ and $X_{s+t}$ .

For a process with a random decay rate these are asymptotically independent and the one-point distributions are relaxing. Therefore, in order to detect non-ergodicity, we need to analyse the dependence between at least three values. A practical choice is to use four values divided into two pairs $[X_{s},X_{s+\Delta t}]$ and $[X_{s+t},X_{s+\Delta t+t}]$ . The values in the first pair are correlated as $\rme^{-\Delta t\Lambda}$ trajectory-wise, analogously for the values of the second pair. This property of both pairs is fixed and random, i.e., it is a constant of motion which can be detected. Probably the simplest method to achieve this is to calculate increments

[TABLE]

and study the codifference of those. A short calculation given in Proposition 11 shows that this method indeed works and

[TABLE]

The result depends on $\Lambda$ in a complex manner, but it can be easily estimated numerically. We can also use the fact that for small $\Delta t$ the conditional covariance of increments is

[TABLE]

and normalise the process, $\Delta\widetilde{X}_{t}\mathrel{\mathop{:}}=\Delta X_{t}/\sqrt{\Delta t}$ . The result then simplifies and becomes independent of $\Delta t$ ,

[TABLE]

We stress here that this method cannot be applied using the covariance, which, calculated from increments, decays to [math] and does not detect this specific memory structure. Its decay is even quicker than for the original process and proportional to the power law decay $t^{-\alpha-2}$ [66]. Intuitively speaking, the decay rate is quicker by a factor $t^{-2}$ , because the scale of $\Delta X$ depends on $\Lambda$ as $\Lambda^{2}$ and the trajectories with stronger correlation have smaller amplitude and add less to the average. This property has its analogy for the codifference, for which $\tau_{\Delta X}^{\theta}(t)-\tau_{\Delta X}^{\theta}(\infty)$ also decays like $t^{-\alpha-2}$ (see Proposition 12 for a more general result). This time the faster decay rate actually helps in detecting ergodicity breaking, making the limit $\tau_{\Delta X}^{\theta}(\infty)$ visible even at short times. The numerical illustration of the discussed behaviour is shown in Figure 2.

B.2 Diffusing-diffusivity

In the preceding sections we considered models which were non-Gaussian and non-ergodic. For non-Gaussian but ergodic models the codifference can also be a useful measure of dependence. In particular we show that it can be successfully used to analyse diffusing-diffusivity models. We now assume that the increments of $X_{t}$ are Brownian fluctuations, but rescaled by a time-dependent random diffusivity $D_{t}$ ,

[TABLE]

This is a generalisation of the random parameter model, for which $D_{t}=\mathrm{const.}$ Because we modified the dynamical equation by replacing the previously constant parameter with a stochastic process, models of this class are sometimes called "doubly stochastic" [75]. Before application in physics, they were extensively used in financial engineering, where it is natural to assume that parameters of the market, such as the volatility, vary in time. In 1985 Cox, Ingersoll and Ross [76] proposed a model of interest rate (now commonly named CIR), which describes a non-negative stochastic process with linear mean-reverting property. In 2012 Chubynsky and Slater independently proposed a special case of the CIR process as a model of non-Gaussian diffusion [19, 77]. This led the way to a wider range of models based on fluctuating diffusivity coefficient with a short time memory [78, 79, 80, 81, 82]. The evolution of the diffusion coefficient in the CIR model is defined by the stochastic equation

[TABLE]

where $a>0$ describes the speed of return to the mean $b>0$ , and $\sigma>0$ regulates the amplitude of the fluctuations. In this equation as $D_{t}\to 0$ the term $a(b-D_{t})\mathrm{d}t\approx ab\mathrm{d}t>0$ starts to dominate the fluctuations with the mean-squared amplitude $\mathbb{E}[(\sqrt{D_{t}}\mathrm{d}B_{t})^{2}]=D_{t}\mathrm{d}t$ , consequently $\mathrm{d}D_{t}>0$ which causes the motion to stay positive. We assume that the system evolved for a long time before the start of the measurement and has reached the stationary gamma distribution $D_{0}\overset{d}{=}\mathcal{G}(2ab/\sigma^{2},2a/\sigma^{2})$ [83]. Because of the non-Gaussianity the LCF function should differ from the MSD. Conditioning by $D_{t}$ , it can be expressed by the formula

[TABLE]

Expanding the above in powers of $\theta^{2}$ shows that again $\zeta_{X}^{\theta}(t)\to\delta_{X}^{2}(t)$ as $\theta\to 0$ .

The average in (B25) appears in the calculation of the expected price of zero-coupon bond and was calculated in the initial paper of Cox, Ingersoll and Ross [76], who derived the differential equation which it fulfils and then solved it; a more general result is also available in [83]. The calculation was performed for the case when $D_{0}$ is fixed and deterministic, however their result can be easily extended for stationary $D$ by averaging over the equilibrium $\mathcal{G}(2ab/\sigma^{2},2a/\sigma^{2})$ distribution of $D_{0}$ . Then the formula for the LCF reads

[TABLE]

with $\gamma_{\theta}=\sqrt{a^{2}+(\theta\sigma)^{2}}$ . From that a brief calculation proves that the motion is Fickian for long times

[TABLE]

and also for short time, albeit with a diffusion scale agreeing with the MSD

[TABLE]

which should come as no surprise. For an illustration of these formulae see Figure 3, where we present results of Monte Carlo simulations compared to the theoretical predictions. See also the crossover behaviour of the MSD in the random diffusivity model in [81].

If we want to analyse the codifference of the CIR model, it would be required to study the memory of the velocity $V_{t}=\sqrt{D_{t}}\mathrm{d}B_{t}/\mathrm{d}t$ . But the white noise $\mathrm{d}B_{t}/\mathrm{d}t$ is not well-defined in a classical sense. It can be interpreted as a distribution which leads to a similar redefinition of the covariance, the familiar Dirac delta. The codifference is, however, non-linear and this approach fails. The solution is to consider only the well-defined velocity processes $V_{t}=\sqrt{D_{t}}Y_{t}$ with $Y_{t}$ being some classical process which models the velocity as being undisturbed by the fluctuations of the diffusivity. The behaviour of the white noise can be studied if we consider $t$ large enough such that $r_{Y}(t)=0$ strictly or approximately. It is natural to assume that $Y_{t}$ is Gaussian, while choosing the model of $D_{t}$ is more subtle.

The CIR process for $ab\in\mathbb{N}$ , can be proved to be a sum of squared independent Ornstein-Uhlenbeck processes, which follows directly from writing the stochastic differential equation of such a sum [83]. Thus, a natural generalisation is to consider $D_{t}$ being a square of a Gaussian process [80, 81]. We will assume that the velocity can be decomposed as

[TABLE]

where both $Z_{t}$ and $Y_{t}$ are Gaussian with variance one. In this model we have ample freedom in describing a wide range of memory types, because any covariance $r_{Z}$ and $r_{Y}$ can be used. By choosing $r_{Y}$ we model the internal dynamics, if $r_{Y}(t)=0$ in the considered time scale we arrive back at (B23); by choosing $r_{Z}$ we model the memory structure of $D_{t}$ : exponential, power law, oscillating, etc. The one-dimensional distributions are more rigged, as we limit ourselves to $D_{t}$ having the PDF of a square Gaussian, that is $\chi^{2}_{1}$ distribution (a special case of the gamma distribution). A rather technical derivation (Proposition 13) then shows that the exact form of the codifference is

[TABLE]

where

[TABLE]

This formula looks complicated, but is composed only of elementary functions. It is illustrated in figure 4, were we plotted the codifference $\tau_{V}^{\theta}$ as a function of $r_{Z}$ and $r_{Y}$ for four different $\theta$ s. Having calculated the codifference for at least two $\theta$ s, one can solve the system of equations resulting from (B30) and calculate $r_{Z},r_{Y}$ . This procedure may be considered simpler than using the covariance $r_{Z}$ , which requires calculating the average of $|Z_{s}Z_{s+t}|$ given by a hard-to-evaluate integral. The covariance $r_{V}$ can also be obtained from taking the limit $\theta\to 0$ of the codifference.

More importantly, when $r_{Y}(t)=0$ the codifference is clearly non-zero, so it detects the dependence introduced by $D_{t}=Z_{t}^{2}$ . Its asymptotic for small $r_{Z}(t)$ (e.g., at long times) in this case is the simple relation

[TABLE]

Thus the codifference detects the memory structure of the time-varying diffusion coefficient $D_{t}=Z_{t}^{2}$ even in the regime $r_{Y}(t)=0$ in which the covariance $r_{V}(t)$ is zero and does not contain any important information. This is also true when $r_{Z}(t)=0$ but $r_{Y}(t)\neq 0$ , this time the codifference is asymptotically proportional to $r_{Y}(t)$ ; the proportionality constant depends only on the one-dimensional distributions of $D$ , the exact form of the dynamics does not matter, see Proposition 14.

For some systems different models of $D_{t}$ may be more suitable. When $D_{t}$ is strongly concentrated around its mean value a possible choice is a simple Gaussian centred around some $b$ , $V_{t}=(\sigma Z_{t}+b)Y_{t}$ . This model permits the unphysical situation when $D_{t}<0$ , but when $\sigma\ll b$ the probability of this event is negligible. In this case an elementary formula for the codifference also can be given (see (C.3)) and again even for $r_{Y}(t)=0$ the internal dependence of $D_{t}$ is still detected, this time with asymptotic

[TABLE]

B.3 Discussion

The aim of this work was to provide the theoretical background for using the codifference as a dependence measure suited for the study of various non-Gaussian and ergodicity breaking models. This goal was achieved in few steps. First we proved that the codifference has intuitive properties that one would expect from a reasonable memory function, such as additivity, positivity for the case of complete dependence and being null for the case of independence. Second, we showed that it can be calculated using fairly straightforward methods for typical random parameters and diffusing-diffusivity models, which represent a significant extension of the previously established results for stable and infinitely divisible processes. Finally, we analysed how the codifference detects forms of dependence and ergodicity breaking which cannot be easily studied using solely covariance-based methods.

We also showed one example of non-detected ergodicity breaking, the case of a Langevin equation with a random return rate. In this case we offer an easy fix: the codifference works well for the increments of this process. We note that within this paper we did not analyse ergodicity breaking caused by ageing. In principle, the codifference should work, but the analytical analysis will be challenging for many of these phenomena.

In addition to the codifference, we also discussed a related quantity, the logarithm of the characteristic function (LCF), which was interpreted as a measure of dispersion. Our contribution is an extension of the Fourier methods and a distinct view based on ideas previously developed only for heavy tailed $\alpha$ -stable distributions. The codifference is also very closely related to the theory of the dynamical functional, which was already successfully used for real data, and should be considered a part of the same framework.

The cost of using this technique is that linearity is a powerful analytical tool, especially for complicated models, and a significant part of this strength is lost when using the codifference. The more complicated defining formula also may make its form more complicated (e.g., see Table 1). However, it is a clear application of the characteristic function which does not seem to be commonly acknowledged and the Fourier-based techniques by themselves are widely used by the scientific community. Thus, it has an advantage, offering a wide choice of established analytical methods and estimation techniques. In some cases (e.g., (B30)) the codifference has a simpler form than the covariance.

We believe that the most important example that was considered was also the simplest: deterministic motion with its scale (diffusion coefficient) varying from trajectory to trajectory. The observed asymptotical behaviour of the codifference contains a lot of useful information and lays the foundation for possible future applications in more complex and realistic models, some of which we discussed. At the same time we stress that even this initial, highly simplified model is being commonly used, especially in biophysical systems.

We are confident that the obtained results are interesting in their own right, but we also promote their additional value by indicating the limitations of the methodology based on the MSD and the covariance. Both are, without a doubt, essential parts of the scientific language related to diffusion and complex phenomena, but their limitations are becoming more and more evident, as contemporary research starts to concentrate around non-Gaussian systems with complicated memory structure; the change is stimulated by increasing experimental evidence. These complex and non-linear phenomena require new complex and non-linear methods.

C Derivations

C.1 Basic definitions and properties

All processes considered in this work can be labelled as "conditionally Gaussian". In practical applications these processes are Gaussian locally, in the temporal or spatial sense. The formal definition is more general.

Definition 1.

We call a process conditionally Gaussian when any of its finite-dimensional distributions is a Gaussian distribution under some conditioning by $\sigma$ -algebra $\mathcal{C}$ . That is, any finite dimensional distribution $\boldsymbol{X}\mathrel{\mathop{:}}=[X_{t_{1}},\ldots,X_{t_{n}}]$ can be written as

[TABLE]

where $A$ and $\boldsymbol{\mu}$ are a $\mathcal{C}$ -measurable $n\times n$ random matrix and an $n$ -dimensional random vector. Both may depend on $t_{1},\ldots,t_{n}$ . The vector $\boldsymbol{Y}$ is i.i.d $\mathcal{N}(0,1)$ and is independent of $A$ and $\boldsymbol{\mu}$ .

If $\boldsymbol{\mu}=0$ for any $t_{1},\ldots,t_{n}$ we call a process conditionally centred Gaussian. Further on we will consider only this class. Similarly, we call a process conditionally stationary Gaussian, if the distribution of $A$ and $\boldsymbol{\mu}$ does not depend on time translation $t_{1},\ldots,t_{n}\mapsto t_{1}+t,\ldots,t_{n}+t$ .

Proposition 1.

The distribution of a conditionally Gaussian process is completely determined by the knowledge of $\mathcal{C}$ , the conditional mean and the conditional covariance

[TABLE]

The process is conditionally centred if and only if $\mu_{X}(t|\mathcal{C})=0$ . The process is conditionally stationary if and only if $\mu_{X}(t|\mathcal{C})=\mathrm{const.}$ and $r_{X}(s,t|\mathcal{C})$ is a function of $t-s$ , denoted $r_{X}(t-s|\mathcal{C})$ .

Proof.

This is a direct consequence of the equality

[TABLE]

The conditional probability on the right is a Gaussian integral and a function of $\mu_{X}(t|\mathcal{C})$ and $r_{X}(s,t|\mathcal{C})$ . The representation of conditionally centred and stationary processes are just a reflection of the analogical representations for Gaussian processes. ∎

Definition 2.

We define the codifference function as

[TABLE]

For stationary process it is a function of $t-s$ , which we denote as $\tau_{X}^{\theta}(t)$ , similarly as for the covariance, see also Eq. (A2).

Additionally, we define the log characteristic function (LCF) as

[TABLE]

All expected values in the above definitions are finite, but they may be complex and the denominator may be 0. This is however not the case in the class of processes considered herein.

Proposition 2.

For any conditionally centred Gaussian process the codifference and the LCF are well-defined real-valued functions.

Proof.

The Gaussian function centred at 0 is positive-definite. The mixture of positive-definite functions is positive-definite. Therefore all expected values in Definition 2 are real numbers larger than 0 and less or equal 1. The logarithms are therefore real. ∎

We also note that for conditionally centred Gaussian processes a reduced formula for the codifference is available,

[TABLE]

which is very useful for calculations. For non-centred process the additional term

[TABLE]

appears. Here all averages are finite, but they can generally be complex values, moreover in particular cases the averages in the denominator can be 0. This strongly suggests the codifference should be used carefully in this case (the same applies to the LCF).

Additionally, representation (C6) yields another desirable property of the codifference:

Proposition 3.

For a conditionally centred Gaussian process with positive covariance $r_{X}(s,t|\mathcal{C})$ the codifference $\tau_{X}^{\theta}(s,t)$ is also positive, a negative conditional covariance implies negative codifference.

If the support of $r_{X}(s,t|\mathcal{C})$ is on both positive and negative half-axes, the sign of the codifference may vary, but it is worth noting that with $r_{X}(t,t|\mathcal{C})$ and $r_{X}(s,s|\mathcal{C})$ fixed, it depends monotonically on $r_{X}(s,t|\mathcal{C})$ , so if the conditional covariance is smaller in the sense of stochastic dominance, the codifference will also be smaller.

Now, a simple fact follows only from the expansion $\ln(x)\in x-1+o(x)$ as $x\to 1$ .

Proposition 4.

For any stationary process $X$ with asymptotically independent values

[TABLE]

Proof.

We assume that $X_{s+t}$ and $X_{s}$ are asymptotically independent as $t\to\infty$ (note that this property is not sufficient to imply that $X$ is mixing). Therefore

[TABLE]

and the ratio of expected values under the logarithm converges to 1 so we can use the expansion $\ln(x)\approx x-1$ . ∎

This simple fact is a prototype for the later results, which describe cases when it is possible to remove the non-linear logarithmic function if the process can be somehow decomposed as a transformation of some weakly dependent variables.

If the process $X$ does not have asymptotically independent values the non-linearity cannot be removed at $t\to\infty$ , but if it is an ensemble of such processes (i.e., the conditioned process is mixing), it can be shown that the codifference converges to a positive constant, non-linearly dependent on the law of $D$ .

C.2 Random parameter models

Proposition 5.

If the process $X$ is an ensemble of mixing stationary centred Gaussian processes, then, denoting $D=\mathbb{E}\left[X_{t}^{2}|\mathcal{C}\right]$ ,

[TABLE]

and equal 0 only for deterministic $D$ .

Proof.

The calculation is simple. Because $r_{X}(t|\mathcal{C})\leq D$ almost surely the random variable $\rme^{\theta^{2}(r_{X}(t|\mathcal{C})-D)}$ is positive and bounded by 1 for every $t$ . We can commute the limit with the logarithm and the averaging, getting

[TABLE]

The non-negativity of the above stems from Jensen’s inequality applied to the function $x\mapsto x^{2}$ and the variable $\rme^{-\theta^{2}D/2}$ .

∎

Remark. A similar calculation repeated for symmetrised codifference (A8) shows that it does not exhibit this behaviour. Under the same assumptions

[TABLE]

i.e., it cannot detect this form of residual dependence and ergodicity breaking.

Proposition 6.

Let the process $X$ have the form

[TABLE]

where $Y$ is a stationary Gaussian process, $\mathbb{E}[Y_{t}^{2}]=1$ , and $D>0$ is a random variable independent of $Y$ . Then the codifference has the form

[TABLE]

a)

It is additive with respect to $D$ , that is if $D=D^{\prime}+D^{\prime\prime}$ for independent $D^{\prime}$ and $D^{\prime\prime}$ , then

[TABLE]

where $X^{\prime}_{t}=\sqrt{D^{\prime}}Y_{t}$ and $X^{\prime\prime}_{t}=\sqrt{D^{\prime\prime}}Y_{t}$ .

b)

It is an increasing function of the covariance $r_{Y}(t)$ , which is smaller than $r_{X}(t)$ for $r_{Y}(t)$ close to $1$ and larger than $r_{X}(t)$ when the latter is close to 0. If $\mathbb{E}[D]<\infty$ the difference $\tau_{X}^{\theta}(t)-r_{X}(t)$ decreases as a function of $r_{Y}(t)$ .

c)

For any mixing $Y$ the difference $\tau_{X}^{\theta}(t)-\tau_{X}^{\theta}(\infty)$ exhibits the same type of asymptotic as the covariance $r_{Y}(t)$ , that is

[TABLE]

Proof.

Let us start from writing the conditional covariance,

[TABLE]

which implies that

[TABLE]

If we substitute $D=D^{\prime}+D^{\prime\prime}$ both numerator and denominator factorise as products of independent random variables. The formula

[TABLE]

follows.

In point b) the monotonic dependence is a consequence of the fact that only the numerator of the fraction in (C14) depends on $r_{Y}(t)$ . It is a Laplace transform of the variable $D$ calculated at the point $\theta^{2}(1-r_{Y}(t))$ , it decreases as the argument increases, so it is an increasing function of $r_{Y}(t)$ . This dependence is continuous. When $r_{X}(t)=0$ , e.g., always for $t=0$ formula (C14) simplifies and we can apply Jensen’s inequality,

[TABLE]

For $r_{Y}(t)$ close to 0 we can use Proposition 5 to determine, that the codifference is positive. For the last property listed in b), let us write the difference $\tau_{X}^{\theta}(t)-r_{X}^{\theta}(t)$ as a function of $r=r_{X}\theta(t)$ ,

[TABLE]

Using the majorised convergence theorem, the derivative of the numerator exists and determines the sign of $f^{\prime}$ . Denoting $F_{r}\mathrel{\mathop{:}}=\theta^{2}(D-\mathbb{E}[D])(1-r)$ we have

[TABLE]

where we used the fact that $\mathbb{E}[F_{r}]=0$ and $x(\rme^{-x}-1)\leq 0$ .

For c) consider $\tau_{X}(t)-\tau_{X}(\infty)$ and use the expansion $\ln(x)\approx x-1$

[TABLE]

Now we can rearrange the right side of the above equation and get

[TABLE]

∎

The analogues of a) and b) also hold for the LCF, the derivation is very similar as in Proposition 6 so we only state the result.

Proposition 7.

Let the process $X$ have the form

[TABLE]

where $Y$ is a centred Gaussian process and $D>0$ is a random variable independent of $Y$ .

Then the LCF has the form

[TABLE]

and:

a)

If $\mathbb{E}[D]<\infty$ then

[TABLE]

b)

It is additive with respect to $D$ , that is if $D=D^{\prime}+D^{\prime\prime}$ for independent $D^{\prime}$ and $D^{\prime\prime}$ , then

[TABLE]

where $X^{\prime}_{t}=\sqrt{D^{\prime}}Y_{t}$ and $X^{\prime\prime}_{t}=\sqrt{D^{\prime\prime}}Y_{t}$ .

c)

It is an increasing function of the MSD $\delta_{Y}^{2}(t)$ .

d)

For $\mathbb{E}[D]<\infty$ the difference $\delta_{X}^{2}(t)-\zeta_{X}^{\theta}(t)$ is non-negative and increases as $\delta_{X}^{2}(t)$ increases.

The asymptotic of the codifference near zero depends on the tail behaviour of $p_{D}$ and can be used to study it. This statement is clarified by the following result.

Proposition 8.

If the stationary Gaussian process $Y$ is mean-square continuous and $X_{t}=\sqrt{D}Y_{t}$ , then

a)

for $\mathbb{E}[D]<\infty$

[TABLE]

and

[TABLE]

b)

If

[TABLE]

for some slowly varying function $L$ , then

[TABLE]

Proof.

For a mean-square continuous $Y$ the covariance $r_{Y}$ is a continuous function. The codifference is also continuous and $\ln(x)\approx x-1$ implies that

[TABLE]

Because

[TABLE]

The derivation for $\zeta_{X}^{\theta}$ is similar. For point b) we write the asymptotic of $\tau_{X}^{\theta}(0)-\tau_{X}^{\theta}(t)$ as the integral

[TABLE]

and simplify the ratio under investigation

[TABLE]

∎

Now, let us move our attention from a random $D$ to the class of processes, for which the shape of the covariance function varies from trajectory to trajectory:

Proposition 9.

For a mixture of stationary Gaussian processes with fixed non-random scale $D=\sigma^{2}$

[TABLE]

The above formula also implies that

[TABLE]

Proof.

Assumption of a fixed variance means that $\mathbb{E}[X(t)^{2}|\mathcal{C}]=\sigma^{2}$ for some deterministic $\sigma^{2}$ . Using the conditional expectancy it follows that

[TABLE]

Now the left inequality is just Jensen’s inequality applied to the function $\ln$ . The right inequality follows from two approximations: the first is $\ln x\leq x-1$ , the second is $\exp(x)\leq L^{-1}\sinh(L)x+\cosh(L)$ for $-L\leq x\leq L$ . ∎

For the exponentially decaying conditional covariance stronger results are available:

Proposition 10.

For a mixture of stationary centred Gaussian processes with conditional covariance $r_{X}(t|\Lambda,D)=D\rme^{-t\Lambda}$ , with $\Lambda$ and $D$ independent, we observe the following asymptotic properties.

a)

Power law behaviour: if $p_{\Lambda}(\lambda)\sim L(\lambda)\lambda^{\alpha-1},\lambda\to 0^{+}$ for slowly varying $L$ , then

[TABLE]

where the constant $C_{\alpha,\theta}$ is

[TABLE]

b)

Quick decay behaviour: if $p_{\Lambda}(\lambda)\in\mathcal{O}(\lambda^{\infty}),\lambda\to 0^{+}$ then

[TABLE]

c)

Truncation: if $\Lambda=\lambda_{0}+\widetilde{\Lambda}$ for deterministic $\lambda_{0}>0$ then

[TABLE]

where $\widetilde{X}$ is a solution of the Langevin equation with viscosity $\widetilde{\Lambda}$ and the same $D$ .

Proof.

For a) first we apply the expansion $\ln(x)\approx x-1$ to $\tau_{X}^{\theta}(t)-\tau_{X}^{\theta}(\infty)$

[TABLE]

Therefore

[TABLE]

Note that the sum within consists of positive terms, so the commutation of expectation and sum is justified.

Now, knowing the asymptotic $p_{\Lambda}(\lambda)\sim\lambda^{\alpha-1},\lambda\to 0^{+}$ we can apply the Tauberian theorem

[TABLE]

The sum (C.2) consists of positive terms, so let us study its asymptotic

[TABLE]

where the commutation of taking the limit and the sum is justified by the inequality

[TABLE]

The right term is convergent with respect to $t$ , therefore it is bounded, so the left term is uniformly bounded with respect to $k$ and we can use the dominated convergence theorem.

Note that the resulting sum is also bounded with respect to $\alpha$ ,

[TABLE]

This concludes the derivation of a). Now let us prove b). We fix integer $N>0$ and then make the estimation

[TABLE]

to obtain

[TABLE]

The limit follows because it is a convergent sum of positive terms.

In order to prove the last point c) notice that $\rme^{-t\lambda_{0}}<1$ and $\rme^{t\lambda_{0}}>1$ so $x\mapsto x^{\rme^{-t\lambda_{0}}}$ is a concave function and $\rme^{-\theta^{2}D\rme^{t\lambda_{0}}}\leq\rme^{-\theta^{2}D}$ . Therefore

[TABLE]

∎

In the next proposition we will study the properties of the increment process

[TABLE]

and use it to detect non-ergodicity.

Proposition 11.

Considering the same process as in Proposition 10, the codifference of its increments $\Delta X_{t}$ converges to a constant

[TABLE]

which equals 0 only when both $D$ and $\Lambda$ are deterministic. After suitable rescaling $\Delta\widetilde{X}_{t}\mathrel{\mathop{:}}=\Delta X_{t}/\sqrt{\Delta t}$ the limit becomes independent of $\Delta t$ ,

[TABLE]

Proof.

The reasoning is similar to the one shown in the proof of Proposition 6 b). The increment process $\Delta X_{t}$ is a stationary process, which is conditionally Gaussian. We can calculate its conditional variance

[TABLE]

and the variance of the difference

[TABLE]

The limit of the codifference is

[TABLE]

Applying Jensen’s inequality to the variable $\rme^{2\theta^{2}D\rme^{-\Delta t\Lambda}}$ and the function $x\mapsto x^{2}$ yields the inequality.

For the rescaled process it is straightforward to calculate that

[TABLE]

∎

The last considered class of covariance functions is $Df(\Lambda)\exp(-t\Lambda)$ . The increment process from Proposition 11 fits this class with $f(\Lambda)=1-\exp(-\Delta t\Lambda)$ , higher order increments and other similar transformations correspond to more complex $f$ , but their behaviour at $0^{+}$ can be easily traced. Note that the proposition below is not a straightforward generalisation of Proposition 10. The statements and methods of the derivation below are similar, but the assumptions do not coincide, because the introduction of the scaling $f(\Lambda)$ with a power law at [math] was made at the cost of adding the strong requirement about the fast decay of tails of $D$ , $\mathbb{E}[\exp(\theta^{2}D)]<\infty$ :

Proposition 12.

Let us consider the stationary, conditionally Gaussian process characterised by the conditional covariance

[TABLE]

Now, let us assume that $D$ and $\Lambda$ are independent, $\mathbb{E}\left[\rme^{\theta^{2}D}\right]<\infty$ and the PDF of $\Lambda$ has the form

[TABLE]

for slowly varying function $L$ . Then, for this class of processes

[TABLE]

Proof.

We start from the formula

[TABLE]

which has the asymptotic

[TABLE]

We thus need to study the tail behaviour of

[TABLE]

We will analyse it using a bottom-up approach and start from considering the long time asymptotic of the conditional expected value $\mathbb{E}[\boldsymbol{\cdot}|D]$ for one term,

[TABLE]

Now take $\delta>0$ such that $f(\lambda)<1$ for all $0\leq\lambda<\delta$ and $\epsilon>0$ such that $L(1/t)>t^{-1/2}$ for sufficiently large $t$

[TABLE]

where we additionally used the inequality $x^{k}\rme^{-x}\leq k^{k}\rme^{-k}$ . Now, for the left term above observe that

[TABLE]

so it is bounded with respect to $t$ by some constant, let it be $c_{1}$ ,

[TABLE]

And for the right term, the Stirling formula shows that

[TABLE]

Moreover straightforward calculation yields

[TABLE]

so the whole series behaves like $k^{-1-\alpha-\gamma}$ and is summable.

Therefore, we have shown that we can use the dominated convergence theorem with respect to series (C65) multiplied by $t^{\alpha+\gamma}/L(1/t)$ . According to (C.2) the term $k=1$ converges to $\mathbb{E}[D]\Gamma(\alpha+\gamma)$ and all terms $k>1$ decay like $t^{-k\gamma}$ . Only the first term remains in the limit $t\to\infty$ and

[TABLE]

∎

Remark. Propositions 10 and 12 above can be generalised by replacing $t$ by $g(t)$ in the formula for the covariance, the only requirement is that $g(t)\to\infty$ as $t\to\infty$ . This allows one to consider some more general types of the dependence, e.g., the power-law $t^{-2H}$ corresponds to $\Lambda=2H$ and $g(t)=\ln(t)$ .

C.3 Diffusing diffusivity

Proposition 13.

Let us assume that $Y$ and $Z$ are centred stationary Gaussian processes. Without loss of generality we assume $\mathbb{E}[Y_{t}^{2}]=\mathbb{E}[Z_{t}^{2}]=1$ . Let $X$ be given by

a)

[TABLE]

b)

[TABLE]

with deterministic $\sigma,d>0$ . Then the codifference of $X$ is given by elementary formulae, as given at the end of corresponding derivations in Eqs. (C.3) and (C.3).

Proof.

We begin by conditioning over $Z_{t}$ , the averages then become Gaussian averages rescaled by values $Z_{t}$ . Next we calculate the denominator in the codifference

[TABLE]

The last equality corresponds to calculating a Gaussian integral, which can also be interpreted as a Laplace transform of the distribution $\chi^{2}(1)$ . The numerator is more complicated,

[TABLE]

The above expectation can be calculated if we decompose $[Z_{s+t},Z_{s}]\overset{d}{=}[A_{+}+A_{-},A_{+}-A_{-}]$ where $A_{+},A_{-}$ are independent Gaussian variables, whose variances can be found to be $\mathbb{E}[A_{\pm}^{2}]=(1\pm r_{Z}(t))/2$ . After substitution the exponent in (C76) factorises into

[TABLE]

Both obtained terms are Gaussian integrals which can be easily evaluated. Taking both together and calculating the logarithm we obtain

[TABLE]

For b) the denominator is simple and yields $(1+(\theta\sigma)^{2})^{-1}$ . The numerator can be expressed as

[TABLE]

Using the formula for the two-dimensional density of $[Z_{s+t},Z_{s}]$ , the term under the logarithm in the formula for the codifference can be expressed as an integral over the function

[TABLE]

where

[TABLE]

The integration over $\mathbb{R}^{2}$ of (C.3) can be changed to an integration over $\mathbb{R}_{+}^{2}$

[TABLE]

Now, the codifference is $\tau_{X}^{\theta}(t)=\theta^{-2}\ln I$ . ∎

When $r_{Y}(t)=0$ the above formulae simplify significantly and simple asymptotic can be derived by direct computation, see Eqs. (B31) and (B32). The case $r_{Z}(t)=0$ also leads to a simplification and can be considered in a more general setting.

Proposition 14.

If $Y_{t}$ is a stationary Gaussian process, $\mathbb{E}[Y_{t}^{2}]=1$ and for large enough $t$ values $D_{s}$ and $D_{s+t}$ are i.i.d and independent of $Y$ , then for $X_{t}=\sqrt{D_{t}}Y_{t}$

[TABLE]

where $D$ has the same distribution as $D_{s}$ or $D_{s+t}$ .

Proof.

We take $t$ large enough so that we can represent the values of $X$ as $X_{s}=\sqrt{D_{1}}Y_{s}$ and $X_{s+t}=\sqrt{D_{2}}Y_{s+t}$ for i.i.d. $D_{1}$ and $D_{2}$ . Using a conditioning on $D_{1},D_{2}$ the codifference can be expressed as

[TABLE]

Now we consider the numerator in the above, divide it by $r_{Y}(t)$ and, using dominated convergence as in previous propositions,

[TABLE]

The result follows. ∎

Appendix A Sample size dependence of codifference and covariance

In supplement to figure 2 we show in figure 5 that even for smaller sample sizes such as $10^{4}$ , $10^{3}$ , and 500 significant differences between the covariance and codifference of increments are visible.

We acknoeldge funding from the Polish National Science Centre, HARMONIA 8 grant no. UMO-2016/22/M/ST1/00233, and from Deutsche Forschungsgemeinschaft, grants ME1535/6-1 and ME1535/7-1. RM was supported by an Alexander von Humboldt Polish Honorary Research Scholarship from the Foundation for Polish Science (Fundacja na rzecz Nauki Polski).

References

[1] R. Metzler, J.-H. Jeon, A. G. Cherstvy, and E. Barkai, Anomalous diffusion models and their properties: non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking, Phys. Chem. Chem. Phys. 16, 24128 (2014).
[2] W. T. Coffey, Y. P. Kalmykov, and J. T. Waldron, The Langevin Equation (Word Scientific, Singapore, 1996).
[3] E. Lutz, Fractional Langevin equation, Phys. Rev. E 64, 051106 (2001).
[4] S. C. Kou, Stochastic modelling in nanoscale physics: Subdiffusion within proteins, Ann. Appl. Stat. 2, 501 (2008).
[5] E. A. Codling, M. J. Plank, and S. Benhamou, Random walk models in biology, J. R. Soc. Interface 5, 813 (2008).
[6] R. Metzler and J. Klafter, The restaurant at the end of the random walk: recent developments in the description of anomalous transport by fractional dynamics, J. Phys. A 37, R161 (2004).
[7] J. H. P. Schulz, E. Barkai, and R. Metzler, Aging renewal theory and application to random walks, Phys. Rev. X 4, 011028 (2014).
[8] B. O’Shaughnessy and I. Procaccia, Diffusion on fractals, Phys. Rev. A 32, 3073 (1985).
[9] C. J. Camacho, Z. Weng, S. Vajda, and C. DeLisi, Free energy landscapes of encounter complexes in protein-protein association, Biophy. J. 76, 1166 (1999).
[10] A. Comtet and D. S. Dean, Exact results on Sinai’s diffusion, J. Phys. A 31, 8595 (1998).
[11] J. Bouchaud and A. Georges, Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications, Phys. Rep. 195, 127 (1990).
[12] E. Renshaw and R. Henderson, The correlated random walk, J. App. Prob. 18, 403 (1981).
[13] P. Bovet and S. Benhamou, Spatial analysis of animals’ movements using a correlated random walk model, J. Theor. Biol. 131, 419 (1988).
[14] V. Tejedor and R. Metzler, Anomalous diffusion in correlated continuous time random walks, J. Phys. A 43, 082002 (2010).
[15] M. Magdziarz, R. Metzler, W. Szczotka, and P. Zebrowski, Correlated Continuous Time Random Walks in External Force Fields, Phys. Rev. E 85, 051103 (2012).
[16] J. H. P. Schulz, A. V. Chechkin, and R. Metzler, Correlated continuous-time random walks: combining scale-invariance with long-range memory for spatial and temporal dynamics, J. Phys. A. 46, 475001 (2013).
[17] V. Zaburdaev, S. Denisov, and J. Klafter, Lévy walks, Rev. Mod. Phys. 87, 483 (2015).
[18] A. G. Cherstvy, A. V. Chechkin, and R. Metzler, Anomalous diffusion and ergodicity breaking in heterogeneous diffusion processes, New J. Phys. 15, 083039 (2013).
[19] M. V. Chubynsky and G. W. Slater, Diffusing diffusivity: A model for anomalous, yet Brownian, diffusion, Phys. Rev. Lett. 113, 098302 (2014).
[20] F. Höfling and T. Franosch, Anomalous transport in the crowded world of biological cells, Rep. Prog. Phys. 76, 046602 (2013).
[21] K. Nørregaard, R. Metzler, C. M. Ritter, K. Berg-Sørensen, and L. B. Oddershede, Manipulation and motion of organelles and single molecules in living cells, Chem. Rev. textbf117, 4342 (2017).
[22] R. Metzler, J.-H. Jeon, and A. G. Cherstvy, Non-Brownian diffusion in lipid membranes: experiments and simulations, Biochimica et Biophysica Acta (BBA) - Biomembranes 1858, 2451 (2016).
[23] B. Everitt and A. Skrondal, The Cambridge Dictionary of Statistics (Cambridge University Press, Cambridge UK, 2010).
[24] T. Downarowicz, Entropy, Scholarpedia 2, 3901 (2007), revision #126991.
[25] M. D. Kendall, Rank Correlation Methods (Griffin, London UK 1970).
[26] P. E. Latham and Y. Roudi, Mutual information, Scholarpedia, 4, 1658 (2009), revision #122173.
[27] M. M. de Oliveira and R. Dickman, Moment ratios for the pair-contact process with diffusion, Phys. Rev. E 74, 011124 (2006).
[28] V. Tejedor, O. Bénichou, R. Voituriez, R. Jungmann, F. Simmel, C. Selhuber-Unkel, L. B. Oddershede, and R. Metzler, Quantitative analysis of single particle trajectories: Mean maximal excursion method, Biophys. J. 98, 1364 (2010).
[29] M. Magdziarz and J. Klafter, Detecting origins of subdiffusion: $p$ -variation test for confined systems, Phys. Rev. E 82, 011129 (2010).
[30] A. Weron, K. Burnecki, E. J. Akin, L. Solé, M. Balcerek, M. M. Tamkun, and D. Krapf, Ergodicity breaking on the neuronal surface emerges from random switching between diffusive states, Sci. Rep. 7 (2017).
[31] R. Metzler, Weak ergodicity breaking and ageing in anomalous diffusion, Int. J. Mod. Phys. Conf. Ser. 36, 1560007 (2015).
[32] Y. He, S. Burov, R. Metzler, and E. Barkai, Random time-scale invariant diffusion and transport coefficients, Phys. Rev. Lett. 101, 058101 (2008).
[33] J.-H. Jeon, V. Tejedor, S. Burov, E. Barkai, C. Selhuber-Unkel, K. Berg-Sørensen, L. Oddershede, and R. Metzler, In vivo anomalous diffusion and weak ergodicity breaking of lipid granules, Phys. Rev. Lett. 106, 048103 (2011).
[34] D. Krapf, E. Marinari, R. Metzler, G. Oshanin, A. Squarcini, and X. Xu, Power spectral density of a single Brownian trajectory: What one can and cannot learn from it, New J. Phys. 20, 023029 (2018).
[35] D. Krapf, N. Lukat, E. Marinari, R. Metzler, G. Oshanin, C. Selhuber-Unkel, A. Squarcini, L. Stadler, M. Weiss, and X. Xu, Spectral Content of a Single Non-Brownian Trajectory, Phys. Rev. X 9, 011019 (2019).
[36] P. Castiglione, A. Mazzino, P. Muratore-Ginanneschi, and A. Vulpiani, On strong anomalous diffusion, Physica D 134, 75 (1999).
[37] G. Samorodnitsky and M. S. Taqqu, Stable Non-Gaussian Random Processes (Chapman & Hall, London UK, 1994).
[38] P. S. Kokoszka and M. S. Taqqu, Fractional ARIMA with stable innovations, Stoch. Proc. Applic. 60, 19 (1995).
[39] P. S. Kokoszka and M. S. Taqqu, Infinite variance stable moving averages with long memory, J. Econom. 73, 79 (1996).
[40] M. Magdziarz, Short and long memory fractional Ornstein-Uhlenbeck $\alpha$ -stable processes, Stoch. Models 23, 451 (2007).
[41] M. Magdziarz, Fractional langevin equation with $\alpha$ -stable noise. a link to fractional ARIMA time series, Studia Mathematica 181, 47 (2007).
[42] K. Burnecki, J. Klafter, M. Magdziarz, and A. Weron, From solar flare time series to fractional dynamics, Physica A 387, 1077 (2008).
[43] A. Wyłomańska, A. Chechkin, J. Gajda, and I. Sokolov, Codifference as a practical tool to measure interdependence, Physica A 421, 412 (2015).
[44] M. Magdziarz and A. Weron, Anomalous diffusion: Testing ergodicity breaking in experimental data, Phys. Rev. E 84, 051138 (2011).
[45] H. Loch-Olszewska and J. Szwabiński, Detection of $\epsilon$ -ergodicity breaking in experimental data-a study of the dynamical functional sensibility, J. Chem. Phys 148, 204105 (2018).
[46] A. Weron, K. Burnecki, S. Mercik, and K. Weron, Complete description of all self-similar models driven by Lévy stable noise, Phys. Rev. E 71, 016113 (2005).
[47] M. Magdziarz, Stochastic representation of subdiffusion processes with time-dependent drift, Stoch. Proc. Applic. 119, 3238 (2009).
[48] H. Haubold, A. Mathai, and R. Saxena, Mittag-Leffler functions and their applications, J. Appl. Math. 2011, 298628 (2011).
[49] J. L. Lebowitz and O. Penrose, Modern ergodic theory, Phys. Today 26, 23 (1973).
[50] S. Janson, Gaussian Hilbert Spaces, Cambridge Tracts in Mathematics (Cambridge University Press, Cambridge UK, 1997).
[51] H. Goldstein, Multilevel Statistical Models (Wiley, London UK, 2011).
[52] C. Beck and E. G. Cohen, Superstatistics, Physica A 322, 267 (2003).
[53] W. Schneider, Grey Noise (World Scientific, Singapore, 1990).
[54] F. Mainardi, Fundamental solutions for the fractional diffusion-wave equation, Appl. Math. Lett. 9, 23 (1996).
[55] J. L. da Silva and M. Erraoui, Grey Brownian motion local time: Existence and weak-approximation, STOCHASTICS 87, 347 (2015).
[56] B. B. Mandelbrot and J. W. van Ness, Fractional Brownian motions, fractional noises and applications SIAM Review 10, 422 (1968).
[57] G. Pagnini, The M-Wright function as a generalization of the Gaussian density for fractional diffusion processes, Frac. Calc. Appl. Anal. 16, 436 (2013).
[58] I. Cornfel, S. Fomin, and Y. Sinai, Ergodic Theory (Springer-Verlag, Heidelberg, 1982).
[59] J. Ślęzak, Asymptotic behaviour of time averages for non-ergodic gaussian processes, Ann. Phys. 383, 285 (2017).
[60] J. Klafter, M. F. Shlesinger, and G. Zumofen, Beyond Brownian motion, Phys. Today 49(2), 33 (1996).
[61] P. Barthelemy, J. Bertolotti, and D. S. Wiersma, A Lévy flight for light, Nature 453, 495 (2008).
[62] R. J. Adler, Random Fields and Geometry (Springer-Verlag, Berlin, 2007).
[63] G. E. Uhlenbeck and L. S. Ornstein, On the theory of the Brownian motion, Phys. Rev. 36, 823 (1930).
[64] J. A. E. Bryson and L. J. Henrikson, Estimation using sampled data containing sequentially correlated noise, J. Spacecraft Rockets 5, 662 (1968).
[65] J.-H. Jeon, E. Barkai, and R. Metzler, Noisy continuous time random walks, J. Chem. Phys. 139, 121916 (2013).
[66] J. Ślęzak, R. Metzler, and M. Magdziarz, Superstatistical generalised Langevin equation: non-Gaussian viscoelastic anomalous diffusion, New J. Phys. 20, 023026 (2018).
[67] R. Zwanzig, Nonequlibrium Statistical Mechanics (Oxford University Press, Oxford UK, 2001).
[68] A. Ashkin, Acceleration and trapping of particles by radiation pressure, Phys. Rev. Lett. 24, 156 (1970).
[69] I. Goychuk, Viscoelastic subdiffusion: gemeralised Langevin equation approach, Adv. Chem. Phys. 150, 187 (2012).
[70] T. Sungkaworn, M.-L. Jobin, K. Burnecki, A. Weron, M. J. Lohse, and D. Calebiro, Single-molecule imaging reveals receptor-G protein interactions at cell surface hot spots, Nature 550, 543 (2017).
[71] S. Thapa, M. A. Lomholt, J. Krog, A. G. Cherstvy, and R. Metzler, Bayesian nested sampling analysis of single particle tracking data: maximum likelihood model selection applied to stochastic diffusivity data, Phys. Chem. Chem. Phys. 20, 29018 (2018).
[72] A. G. Cherstvy, S. Thapa, C. E. Wagner, and R. Metzler, Non-Gaussian, non-ergodic, and non-Fickian diffusion of tracers in mucin hydrogels, Soft Matter, at press; DOI: 10.1039/C8SM02096E.
[73] L. de Haan, On Regular Variation and its Applications to the Weak Convergence of Sample Extremes, in Mathematical Centre tracts, vol 32 (Mathematisch Centrum, Amsterdam, 1970).
[74] T. Mikosch, Regular variation, subexponentiality and their applications in probability theory, Report Eurandom vol 99013 (Eurandom, Eindhoven, 1999).
[75] D. Tjøstheim, Some doubly stochastic time series models, J. Time Ser. Anal. 7, 51 (1986).
[76] J. C. Cox, J. E. Ingersoll, and S. A. Ross, A theory of the term structure of interest rates, Econometrica 53, 385 (1985).
[77] R. Jain and K. L. Sebastian, Diffusing diffusivity: a new derivation and comparison with simulations, J. Chem. Sci. 129, 929 (2017).
[78] R. Jain and K. L. Sebastian, Diffusion in a crowded, rearranging environment, J. Phys. Chem. B 120, 3988 (2016).
[79] N. Tyagi and B. J. Cherayil, Non-Gaussian Brownian diffusion in dynamically disordered thermal environments, J. Phys. Chem. B 121, 7204 (2017).
[80] A. V. Chechkin, F. Seno, R. Metzler, and I. M. Sokolov, Brownian yet non-Gaussian diffusion: From superstatistics to subordination of diffusing diffusivities, Phys. Rev. X 7, 021002 (2017).
[81] V. Sposini, A. Chechkin, F. Seno, G. Pagnini, and R. Metzler, Random diffusivity from stochastic equations: comparison of two models for Brownian yet non-Gaussian diffusion, New J. Phys. 20, 043044 (2018).
[82] Y. Lanoiselée, N. Moutal, and D. S. Grebenkov, Diffusion-limited ractions in dynamic heterogeneous media, Nature Comm. 9, 4398 (2018).
[83] M. Jeanblanc, M. Yor, and M. Chesney, Mathematical Methods for Financial Markets (Springer-Verlag, Berlin, 2009).

Bibliography83

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Metzler, J.-H. Jeon, A. G. Cherstvy, and E. Barkai, Anomalous diffusion models and their properties: non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking, Phys. Chem. Chem. Phys. 16 , 24128 (2014).
2[2] W. T. Coffey, Y. P. Kalmykov, and J. T. Waldron, The Langevin Equation (Word Scientific, Singapore, 1996).
3[3] E. Lutz, Fractional Langevin equation, Phys. Rev. E 64 , 051106 (2001).
4[4] S. C. Kou, Stochastic modelling in nanoscale physics: Subdiffusion within proteins, Ann. Appl. Stat. 2 , 501 (2008).
5[5] E. A. Codling, M. J. Plank, and S. Benhamou, Random walk models in biology, J. R. Soc. Interface 5 , 813 (2008).
6[6] R. Metzler and J. Klafter, The restaurant at the end of the random walk: recent developments in the description of anomalous transport by fractional dynamics, J. Phys. A 37 , R 161 (2004).
7[7] J. H. P. Schulz, E. Barkai, and R. Metzler, Aging renewal theory and application to random walks, Phys. Rev. X 4 , 011028 (2014).
8[8] B. O’Shaughnessy and I. Procaccia, Diffusion on fractals, Phys. Rev. A 32 , 3073 (1985).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Codifference can detect ergodicity breaking and non-Gaussianity

Abstract

A Introduction

A.1 Statistical measures in modelling of diffusion

A.2 Overview of the codifference

B Modelling

B.1 Gaussian diffusion governed by random parameters

B.1.1 Random diffusion coefficient.

B.1.2 Random memory decay rate.

B.2 Diffusing-diffusivity

B.3 Discussion

C Derivations

C.1 Basic definitions and properties

Definition 1**.**

Proposition 1**.**

Proof.

Definition 2**.**

Proposition 2**.**

Proof.

Proposition 3**.**

Proposition 4**.**

Proof.

C.2 Random parameter models

Proposition 5**.**

Proof.

Proposition 6**.**

Proof.

Proposition 7**.**

Proposition 8**.**

Proof.

Proposition 9**.**

Proof.

Proposition 10**.**

Proof.

Proposition 11**.**

Proof.

Proposition 12**.**

Proof.

C.3 Diffusing diffusivity

Proposition 13**.**

Proof.

Proposition 14**.**

Proof.

Appendix A Sample size dependence of codifference and covariance

References

Definition 1.

Proposition 1.

Definition 2.

Proposition 2.

Proposition 3.

Proposition 4.

Proposition 5.

Proposition 6.

Proposition 7.

Proposition 8.

Proposition 9.

Proposition 10.

Proposition 11.

Proposition 12.

Proposition 13.

Proposition 14.