Introduction to Dynamic Linear Models for Time Series Analysis

Marko Laine

arXiv:1903.11309·stat.ME·August 20, 2019

Introduction to Dynamic Linear Models for Time Series Analysis

Marko Laine

PDF

TL;DR

Dynamic linear models (DLM) provide a flexible, hierarchical framework for analyzing diverse time series data, accommodating non-stationarity, missing data, and varying observation accuracy, with efficient estimation methods.

Contribution

This paper introduces the DLM framework, demonstrating its ability to unify classical models and handle complex real-world time series analysis tasks.

Findings

01

DLM can model non-stationary processes effectively.

02

Efficient estimation via Kalman and MCMC methods is feasible.

03

DLM accommodates missing data and irregular sampling.

Abstract

Dynamic linear models (DLM) offer a very generic framework to analyse time series data. Many classical time series models can be formulated as DLMs, including ARMA models and standard multiple linear regression models. The models can be seen as general regression models where the coefficients can vary in time. In addition, they allow for a state space representation and a formulation as hierarchical statistical models, which in turn is the key for efficient estimation by Kalman formulas and by Markov chain Monte Carlo (MCMC) methods. A dynamic linear model can handle non-stationary processes, missing values and non-uniform sampling as well as observations with varying accuracies. This chapter gives an introduction to DLM and shows how to build various useful models for analysing trends and other sources of variability in geodetic time series.

Tables2

Table 1. Table 3.1: Conditional DLM distributions and the corresponding algorithms. The variables usead are: x t subscript 𝑥 𝑡 x_{t} for the time varying state of the system (e.g. trend), y t subscript 𝑦 𝑡 y_{t} for the observations at each time t 𝑡 t , and θ 𝜃 \theta for structural parameters used in the model and covariance matrices. Notation x 1 : n subscript 𝑥 : 1 𝑛 x_{1:n} means all time instances for 1 , … , n 1 … 𝑛 1,\dots,n .

distribution	meaning	algorithm
$p (x_{t} \| x_{t - 1}, y_{1 : t - 1}, θ)$	one step prediction	Kalman filter
$p (x_{t} \| y_{1 : t}, θ)$	filter solution	Kalman filter
$p (x_{t} \| y_{1 : n}, θ)$	smoother solution	Kalman smoother
$p (x_{1 : n} \| y_{1 : n}, θ)$	full state given parameters	simulation smoother
$p (y_{1 : t} \| θ)$	marginal likelihood for parameters	Kalman filter likelihood
$p (x_{1 : n}, θ \| y_{1 : n})$	full state and parameter	MCMC
$p (θ \| y_{1 : n})$	marginal for parameter	MCMC
$p (x_{1 : n} \| y_{1 : n})$	marginal for full state	MCMC

Table 2. Table 3.2: Parameter estimates from DLM/MCMC estimation for the synthetic GNSS time series example. The uncertainty value is one-sigma posterior standard deviation. The true values for trends were 12.59, 17.64, and 2.778 mm/yr. The true seasonal amplitude was 1 mm.

data	trend [mm/yr]	seasonal [mm]	$σ_{level}$	$σ_{AR}$	$ρ_{AR}$
East	$12.62 \pm 0.61$	$0.93 \pm 0.15$	$0.20 \pm 0.024$	$0.85 \pm 0.024$	$0.62 \pm 0.03$
North	$17.76 \pm 0.69$	$1.19 \pm 0.16$	$0.22 \pm 0.02$	$0.86 \pm 0.024$	$0.64 \pm 0.29$
Up	$2.22 \pm 1.00$	$0.74 \pm 0.29$	$0.34 \pm 0.07$	$2.00 \pm 0.075$	$0.87 \pm 0.016$

Equations56

y_{t}

y_{t}

x_{t}

y_{t}

y_{t}

μ_{t}

α_{t}

\quad x_{t}=\begin{bmatrix}\mu_{t}&\alpha_{t}\end{bmatrix}^{T},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}=\begin{bmatrix}1&0\\ \end{bmatrix},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}=\begin{bmatrix}1&1\\ 0&1\\ \end{bmatrix},

\quad x_{t}=\begin{bmatrix}\mu_{t}&\alpha_{t}\end{bmatrix}^{T},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}=\begin{bmatrix}1&0\\ \end{bmatrix},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}=\begin{bmatrix}1&1\\ 0&1\\ \end{bmatrix},

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Q}$}}{\mbox{\boldmath$\mathrm{\textstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Q}$}}=\begin{bmatrix}\sigma^{2}_{\mathrm{level}}&0\\ 0&\sigma^{2}_{\mathrm{trend}}\\ \end{bmatrix},\quad\text{and}\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle R}$}}{\mbox{\boldmath$\mathrm{\textstyle R}$}}{\mbox{\boldmath$\mathrm{\scriptstyle R}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle R}$}}=\mathrm{diag}\left(\begin{bmatrix}\sigma^{2}_{\mathrm{obs}}&\dots&\sigma^{2}_{\mathrm{obs}}\end{bmatrix}\right),

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Q}$}}{\mbox{\boldmath$\mathrm{\textstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Q}$}}=\begin{bmatrix}\sigma^{2}_{\mathrm{level}}&0\\ 0&\sigma^{2}_{\mathrm{trend}}\\ \end{bmatrix},\quad\text{and}\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle R}$}}{\mbox{\boldmath$\mathrm{\textstyle R}$}}{\mbox{\boldmath$\mathrm{\scriptstyle R}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle R}$}}=\mathrm{diag}\left(\begin{bmatrix}\sigma^{2}_{\mathrm{obs}}&\dots&\sigma^{2}_{\mathrm{obs}}\end{bmatrix}\right),

θ = [σ_{obs}^{2} σ_{level}^{2} σ_{trend}^{2}]^{T} .

θ = [σ_{obs}^{2} σ_{level}^{2} σ_{trend}^{2}]^{T} .

Δ^{2} μ_{t} = μ_{t - 1} - 2 μ_{t} + μ_{t + 1} = Δ α_{t} = ϵ_{trend},

Δ^{2} μ_{t} = μ_{t - 1} - 2 μ_{t} + μ_{t + 1} = Δ α_{t} = ϵ_{trend},

p (x_{t}, θ ∣ y_{t}) \propto p (y_{t} ∣ x_{t}, θ) p (x_{t} ∣ θ) p (θ),

p (x_{t}, θ ∣ y_{t}) \propto p (y_{t} ∣ x_{t}, θ) p (x_{t} ∣ θ) p (θ),

\displaystyle\widehat{\mathchoice{\mbox{\boldmath$\displaystyle x$}}{\mbox{\boldmath$\textstyle x$}}{\mbox{\boldmath$\scriptstyle x$}}{\mbox{\boldmath$\scriptscriptstyle x$}}}_{t}

\displaystyle\widehat{\mathchoice{\mbox{\boldmath$\displaystyle x$}}{\mbox{\boldmath$\textstyle x$}}{\mbox{\boldmath$\scriptstyle x$}}{\mbox{\boldmath$\scriptscriptstyle x$}}}_{t}

\displaystyle\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{t}

\displaystyle\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle G}$}}{\mbox{\boldmath$\mathrm{\textstyle G}$}}{\mbox{\boldmath$\mathrm{\scriptstyle G}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle G}$}}_{t}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle G}$}}{\mbox{\boldmath$\mathrm{\textstyle G}$}}{\mbox{\boldmath$\mathrm{\scriptstyle G}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle G}$}}_{t}

\displaystyle\mathchoice{\mbox{\boldmath$\displaystyle r$}}{\mbox{\boldmath$\textstyle r$}}{\mbox{\boldmath$\scriptstyle r$}}{\mbox{\boldmath$\scriptscriptstyle r$}}_{t}

\displaystyle\overline{\mathchoice{\mbox{\boldmath$\displaystyle x$}}{\mbox{\boldmath$\textstyle x$}}{\mbox{\boldmath$\scriptstyle x$}}{\mbox{\boldmath$\scriptscriptstyle x$}}}_{t}

\displaystyle\overline{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{t}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}_{t}^{+}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}_{t}^{+}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle G}$}}{\mbox{\boldmath$\mathrm{\textstyle G}$}}{\mbox{\boldmath$\mathrm{\scriptstyle G}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle G}$}}_{t}

x_{t - 1}

\displaystyle\widetilde{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{t-1}

x_{1 : n}^{*} = \overset{x}{ˇ}_{1 : n} - \overset{x}{˘}_{1 : n} + x_{1 : n} .

x_{1 : n}^{*} = \overset{x}{ˇ}_{1 : n} - \overset{x}{˘}_{1 : n} + x_{1 : n} .

-2\log\left(p(y_{1:n}|\theta)\right)=\mathrm{constant}+\sum_{t=1}^{n}\left[(y_{t}-\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}\widehat{x}_{t})^{T}\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}^{-1}(y_{t}-\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}\widehat{x}_{t})+\log(|\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}|)\right].

-2\log\left(p(y_{1:n}|\theta)\right)=\mathrm{constant}+\sum_{t=1}^{n}\left[(y_{t}-\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}\widehat{x}_{t})^{T}\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}^{-1}(y_{t}-\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}\widehat{x}_{t})+\log(|\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}|)\right].

r^{*}_{t}=\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}^{-1/2}(y_{t}-\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}\widehat{x}_{t})

r^{*}_{t}=\widehat{\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle C}$}}{\mbox{\boldmath$\mathrm{\textstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptstyle C}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle C}$}}}_{y,t}^{-1/2}(y_{t}-\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}\widehat{x}_{t})

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}_{\mathrm{seas}}=\begin{bmatrix}\cos(\pi/6)&\sin(\pi/6)&0&0\\ -\sin(\pi/6)&\cos(\pi/6)&0&0\\ 0&0&\cos(\pi/3)&\sin(\pi/3)\\ 0&0&-\sin(\pi/3)&\cos(\pi/3)\\ \end{bmatrix}

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}_{\mathrm{seas}}=\begin{bmatrix}\cos(\pi/6)&\sin(\pi/6)&0&0\\ -\sin(\pi/6)&\cos(\pi/6)&0&0\\ 0&0&\cos(\pi/3)&\sin(\pi/3)\\ 0&0&-\sin(\pi/3)&\cos(\pi/3)\\ \end{bmatrix}

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{\mathrm{seas}}=\begin{bmatrix}1&0&1&0\\ \end{bmatrix}.

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{\mathrm{seas}}=\begin{bmatrix}1&0&1&0\\ \end{bmatrix}.

y_{t} = ρ_{1} y_{t - 1} + ρ_{2} y_{t - 2} + \dots + ρ_{p} y_{t - p} + ϵ, ϵ \sim N (0, σ_{AR}^{2})

y_{t} = ρ_{1} y_{t - 1} + ρ_{2} y_{t - 2} + \dots + ρ_{p} y_{t - p} + ϵ, ϵ \sim N (0, σ_{AR}^{2})

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}_{\mathrm{AR}}=\begin{bmatrix}\rho_{1}&1&0\\ \rho_{2}&0&1\\ \rho_{3}&0&0\\ \end{bmatrix},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{\mathrm{AR}}=\begin{bmatrix}1&0&0\\ \end{bmatrix},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Q}$}}{\mbox{\boldmath$\mathrm{\textstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Q}$}}_{\mathrm{AR}}=\begin{bmatrix}\sigma^{2}_{\mathrm{AR}}&0&0\\ 0&0&0\\ 0&0&0\end{bmatrix}.

\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}_{\mathrm{AR}}=\begin{bmatrix}\rho_{1}&1&0\\ \rho_{2}&0&1\\ \rho_{3}&0&0\\ \end{bmatrix},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{\mathrm{AR}}=\begin{bmatrix}1&0&0\\ \end{bmatrix},\quad\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Q}$}}{\mbox{\boldmath$\mathrm{\textstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Q}$}}_{\mathrm{AR}}=\begin{bmatrix}\sigma^{2}_{\mathrm{AR}}&0&0\\ 0&0&0\\ 0&0&0\end{bmatrix}.

y_{t}=\mu_{t}+\gamma_{t}+\beta_{t}\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Z}$}}{\mbox{\boldmath$\mathrm{\textstyle Z}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Z}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Z}$}}_{t}+\epsilon_{\mathrm{obs}},

y_{t}=\mu_{t}+\gamma_{t}+\beta_{t}\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Z}$}}{\mbox{\boldmath$\mathrm{\textstyle Z}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Z}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Z}$}}_{t}+\epsilon_{\mathrm{obs}},

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{\mathrm{proxy}(t)}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{\mathrm{proxy}(t)}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}_{\mathrm{proxy}}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Q}$}}{\mbox{\boldmath$\mathrm{\textstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Q}$}}_{\mathrm{proxy}}

x_{t}

x_{t}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle M}$}}{\mbox{\boldmath$\mathrm{\textstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptstyle M}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle M}$}}_{t}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle H}$}}{\mbox{\boldmath$\mathrm{\textstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptstyle H}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle H}$}}_{t}

\displaystyle\mathchoice{\mbox{\boldmath$\mathrm{\displaystyle Q}$}}{\mbox{\boldmath$\mathrm{\textstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptstyle Q}$}}{\mbox{\boldmath$\mathrm{\scriptscriptstyle Q}$}}_{t}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newunicodechar

°∘

11institutetext: Marko Laine 22institutetext: Finnish Meteorological Institute, Helsinki, Finland, 22email: [email protected]

Introduction to Dynamic Linear Models for Time Series Analysis

Marko Laine

Abstract

Dynamic linear models (DLM) offer a very generic framework to analyse time series data. Many classical time series models can be formulated as DLMs, including ARMA models and standard multiple linear regression models. The models can be seen as general regression models where the coefficients can vary in time. In addition, they allow for a state space representation and a formulation as hierarchical statistical models, which in turn is the key for efficient estimation by Kalman formulas and by Markov chain Monte Carlo (MCMC) methods. A dynamic linear model can handle non-stationary processes, missing values and non-uniform sampling as well as observations with varying accuracies. This chapter gives an introduction to DLM and shows how to build various useful models for analysing trends and other sources of variability in geodetic time series.

3.1 Introduction to dynamic linear models

Statistical analysis of time series data is usually faced with the fact that we have only one realization of a process whose properties might not be fully understood. We need to assume that some distributional properties of the process that generate the observations do not change with time. In linear trend analysis, for example, we assume that there is an underlying change in the background mean that stays approximately constant over time. Dynamic regression avoids this by explicitly allowing temporal variability in the regression coefficients and by letting some of the system properties to change in time. Furthermore, the use of unobservable state variables allows direct modelling of the processes that are driving the observed variability, such as seasonal variation or external forcing, and we can explicitly allow some modelling error.

Dynamic regression can be formulated in very general terms by using a state space representation of the observations and the hidden state of the system. With sequential definition of the processes, having conditional dependence only on the previous time step, the classical recursive Kalman filter algorithms can be used to estimate the model states given the observations. When the operators involved in the definition of the system are linear we have so called dynamic linear model (DLM).

A basic model for time series in geodetic or more general environmental applications consists of four elements: a slowly varying background level, a seasonal component, external forcing from known processes modelled by proxy variables, and stochastic noise. The noise component might contain an autoregressive structure to account for temporally correlated model residuals. As we see, the basic components have some physical justification and we might be interested in their contribution to the overall variability and their temporal changes. These components are hidden in the sense that we do not observe them directly and each individual component is masked by various other sources of variability in the observations.

Below, we briefly describe the use of dynamic linear models in time series analysis. The examples deal with univariate time series, i.e. the observation at a singe time instance is a scalar, but the framework and the computer code can handle multivariate data, too. All the model equations are written in way that support multivariate observations. In the presented applications we are mostly interested in extracting the components related to the trends and using these to infer about their magnitude and the uncertainties involved. However, these models might not be so good for produce predictions about the behaviour of the system in the future, although understanding the system is a first step to be able to make predictions.

The use of DLMs in time series analysis is well documented in statistical literature, but they might go by different terminology and notation. In Harvey (1991) they are called structural time series, Durbin and Koopman (2012) uses the state space approach, and the acronym DLM is used in Petris et al (2009).

3.2 State space description

The state space description offers a unified formulation for the analysis of dynamic regression models. The same formulation is used extensively in signal processing and geophysical data assimilation studies, for example. A general dynamic linear model with an observation equation and a model equation is

[TABLE]

Above $y_{t}$ is a vector of length $k$ of observations at time $t$ , with $t=1,\dots,n$ . Vector $x_{t}$ of length $m$ contains the unobserved states of the system that evolve in time according to a linear system operator $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle M} $}}{\mbox{\boldmath$ \mathrm{\textstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle M} $}}_{t}$ (a $m\times m$ matrix). In time series settings $x_{t}$ will have elements corresponding to various components of the time series process, like trend, seasonality, etc. We observe a linear combination of the states with noise $\epsilon_{t}$ , and matrix $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle H} $}}{\mbox{\boldmath$ \mathrm{\textstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle H} $}}_{t}$ ( $m\times k$ ) is the observation operator that transforms the model states into observations. Both observations and the system states can have additive Gaussian errors with covariance matrices $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle R} $}}{\mbox{\boldmath$ \mathrm{\textstyle R} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle R} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle R} $}}_{t}$ ( $k\times k$ ) and $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{t}$ ( $m\times m$ ), respectively. In univariate time series analysis we will have $k=1$ . With multivariate data, the system matrices $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle M} $}}{\mbox{\boldmath$ \mathrm{\textstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle M} $}}_{t}$ , $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle H} $}}{\mbox{\boldmath$ \mathrm{\textstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle H} $}}_{t}$ , $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle R} $}}{\mbox{\boldmath$ \mathrm{\textstyle R} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle R} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle R} $}}_{t}$ and $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{t}$ can be used to define correlations between the observed components.

This formulation is quite general and flexible as it allows handling of many time series analysis problems in a single framework. Moreover, a unified computational tool can be used, i.e. a single DLM computer code can be used for various purposes. Below we give examples of different analyses. As we are dealing with linear models, we assume that the operators $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle M} $}}{\mbox{\boldmath$ \mathrm{\textstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle M} $}}_{t}$ and $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle H} $}}{\mbox{\boldmath$ \mathrm{\textstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle H} $}}_{t}$ are linear. However, they can change with the time index $t$ and we will drop the time index in the cases where the matrices are assumed static in time. The state space framework can be extended to non-linear model and non-Gaussian errors, and to spatial-temporal analyses as well, see, e.g., Cressie and Wikle (2011); Särkkä (2013). However, as can be seen in the following example, already the dynamic linear Gaussian formulation provides a large class of models for time series trend analyses.

3.2.1 Example: spline smoothing

A simple local level and local trend model can be used as a basis for many trend related studies. Consider a mean level process $\mu_{t}$ which is changing smoothly in time and which we observe with additive Gaussian noise. We assume that the change in the mean, $\mu_{t+1}-\mu_{t}$ , is controlled by a trend process $\alpha_{t}$ and the temporal change in these processes is assumed to be Gaussian with given variances $\sigma^{2}_{\mathrm{level}}$ and $\sigma^{2}_{\mathrm{trend}}$ . This can be written as

[TABLE]

which in state space representation transfers simply into

[TABLE]

with three parameters for the error variances

[TABLE]

We have dropped the time index $t$ from those elements that do not depend on time.

It is interesting to note, that if we set $\sigma^{2}_{\mathrm{level}}=0$ , we have a second difference process for $\mu_{t}$ as

[TABLE]

and it can be shown (Durbin and Koopman, 2012) that this is equivalent to cubic spline smoothing with smoothing parameter $\lambda=\sigma^{2}_{\mathrm{trend}}/\sigma^{2}_{\mathrm{obs}}>0$ .

Figure 3.1 shows simulated observations with a true piecewise trend and the fitted mean process $\mu_{t}$ , $t=1,\dots,n$ together with its 95% uncertainty limits. In this example, the observation uncertainty standard deviation ( $\sigma_{\mathrm{obs}}=0.3$ ) as well as the level and trend variability standard deviations ( $\sigma_{\mathrm{level}}=0.0$ , $\sigma_{\mathrm{trend}}=0.01$ ) are assumed to be known. In the later examples these values are estimated from the data.

3.3 DLM as hierarchical statistical model

The DLM formulation can be seen as a special case of a general hierarchical statistical model with three levels: data, process and parameters (see e.g. Cressie and Wikle (2011)), with corresponding conditional statistical distributions. First, the observation uncertainty $p(y_{t}|x_{t},\theta)$ described by the observation equation and forming the statistical likelihood function, second, the process uncertainty of the unknown states $x_{t}$ and their evolution given by the process equations as $p(x_{t}|\theta)$ or $p(x_{t}|x_{t-1},\theta)$ , and third, the unconditional prior uncertainty for the model parameters $p(\theta)$ . This formulation allows both an efficient description of the system and computational tools to estimate the components. It also combines different statistical approaches, as we can have full prior probabilities for the unknowns (the Bayesian approach), estimate them by maximum likelihood and plug them back (frequentistic approach), or even fix the model parameters by expert knowledge (a non-statistical approach). By the Bayes formula, we can write the state and parameter posterior distributions as a product of the conditional distributions

[TABLE]

which is the basis for full Bayesian estimation procedures. Next we will describe the steps needed for Bayesian DLM estimation of model states, parameters and their uncertainties.

3.4 State and parameter estimation

To recall the notation, $y_{t}$ are the observations and $x_{t}$ are the hidden system states for time indexes $t=1,\dots,n$ . In addition, we have a static vector $\theta$ that contains auxiliary parameters needed in defining the system matrices $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle M} $}}{\mbox{\boldmath$ \mathrm{\textstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle M} $}}_{t}$ and $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle H} $}}{\mbox{\boldmath$ \mathrm{\textstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle H} $}}_{t}$ and the model and observation error covariance matrices $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{t}$ and $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle R} $}}{\mbox{\boldmath$ \mathrm{\textstyle R} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle R} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle R} $}}_{t}$ . For dynamic linear models we have efficient and well founded computational tools for all relevant statistical distributions of interest. For the state estimation assuming a known parameter vector $\theta$ the assumptions on linearity and Gaussian errors allows us to estimate the model states by classical recursive Kalman formulas. The variance and other structural parameters appear in non-linear way and their estimation can be done either by numerical optimization or by Markov chain Monte Carlo (MCMC) methods. MCMC allows for a full Bayesian statistical analysis for the joint uncertainty in the dynamic model states and the static structural parameters (Gamerman, 2006). Table 3.1 relates the different statistical distributions to the algorithms, which are outlined later. The notation $y_{1:t}$ , $x_{1:t}$ , etc. means the collection of observations or states from time 1 to time $t$ .

3.5 Recursive Kalman formulas

Below we give the relevant parts of the recursive formulas for Kalman filter and smoother to estimate the conditional distributions of DLM states given the observations and static parameters. For more details, see Rodgers (2000); Laine et al (2014). A notable feature of the linear Gaussian case is that the formulas below are exact and easily implemented in computer as long as the model state dimension or the number of observations at one time is not too large.

To start the calculations, we assume that the initial distribution of $x_{0}$ at $t=0$ is available. The first step in estimating the states is to use Kalman filter forward recursion to calculate the distribution of the state vector $\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}_{t}$ given the observations up to time $t$ , $p(\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}_{t}|\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}_{1:t},\mathchoice{\mbox{\boldmath$ \displaystyle\theta $}}{\mbox{\boldmath$ \textstyle\theta $}}{\mbox{\boldmath$ \scriptstyle\theta $}}{\mbox{\boldmath$ \scriptscriptstyle\theta $}})=N(\overline{\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}}_{t},\overline{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{t})$ , which is Gaussian by the linearity assumptions. At each time $t$ this step consists of first calculating, as prior, the mean and covariance matrix of one-step-ahead predicted states $p(\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}_{t}|\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}_{t-1},\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}_{1:t-1},\mathchoice{\mbox{\boldmath$ \displaystyle\theta $}}{\mbox{\boldmath$ \textstyle\theta $}}{\mbox{\boldmath$ \scriptstyle\theta $}}{\mbox{\boldmath$ \scriptscriptstyle\theta $}})=N(\widehat{\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}}_{t},\widehat{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{t})$ and the covariance matrix of the predicted observations $\widehat{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{y,t}$ as

[TABLE]

Then the posterior state mean and its covariance are calculated using the Kalman gain matrix $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle G} $}}{\mbox{\boldmath$ \mathrm{\textstyle G} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle G} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle G} $}}_{t}$ as

[TABLE]

These equations are iterated for $t=1,\dots,n$ and the values of $\overline{{x}}_{t}$ and $\overline{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{t}$ are stored for further calculations. As initial values, we can use $\overline{\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}}_{0}=\mathchoice{\mbox{\boldmath$ \displaystyle 0 $}}{\mbox{\boldmath$ \textstyle 0 $}}{\mbox{\boldmath$ \scriptstyle 0 $}}{\mbox{\boldmath$ \scriptscriptstyle 0 $}}$ and $\overline{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{0}=\kappa\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle I} $}}{\mbox{\boldmath$ \mathrm{\textstyle I} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle I} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle I} $}}$ , i.e. a vector of zeros and a diagonal matrix with some large value $\kappa$ in the diagonal. Note that the only matrix inversion required in the above formulas is the one related to the observation prediction covariance matrix $\widehat{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{y,t}$ , which is of size $1\times 1$ when we analyse univariate time series.

The Kalman filter provides distributions of the states at each time $t$ given the observations up to the current time. As we want to do retrospective time series analysis that accounts for all of the observations, we need to have the distributions of the states for each time, given all the observations $\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}_{1:n}$ . By the linearity of the model, these distributions are again Gaussian, $p(\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}_{t}|\mathchoice{\mbox{\boldmath$ \displaystyle y $}}{\mbox{\boldmath$ \textstyle y $}}{\mbox{\boldmath$ \scriptstyle y $}}{\mbox{\boldmath$ \scriptscriptstyle y $}}_{1:n},\mathchoice{\mbox{\boldmath$ \displaystyle\theta $}}{\mbox{\boldmath$ \textstyle\theta $}}{\mbox{\boldmath$ \scriptstyle\theta $}}{\mbox{\boldmath$ \scriptscriptstyle\theta $}})=N(\widetilde{\mathchoice{\mbox{\boldmath$ \displaystyle x $}}{\mbox{\boldmath$ \textstyle x $}}{\mbox{\boldmath$ \scriptstyle x $}}{\mbox{\boldmath$ \scriptscriptstyle x $}}}_{t},\widetilde{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{t})$ . Using the matrices generated by the Kalman forward recursion, the Kalman smoother backward recursion gives us the smoothed states for $t=n,n-1,\dots,1$ . There are several equivalent versions of the backward recursion algorithm. Below we show the Rauch-Tung-Striebel recursion (Särkkä, 2013) for illustration. For alternatives, see Durbin and Koopman (2012):

[TABLE]

In smoother recursion we are dealing with several matrix vector computations and one matrix inversion of size $m\times m$ and these formulas can be implemented quite efficiently in any general numerical analysis software. As a note, we see that the algorithms work with missing observations, too. If some observations at a time $t$ are missing, the corresponding columns of the gain matrix Eq. (3.14) will be zero. If all are missing, the filter posterior will be equal to the prior. Note that the above smoother recursion does not refer to the observations. All the Kalman formulas given above are for observations with uniform sampling in time, for non-uniform temporal sampling, the propagation of uncertainty to the next observation time has to be handled differently, see Harvey (1991); Durbin and Koopman (2012).

3.6 Simulation smoother

The Kalman smoother algorithm provides the distributions $p(x_{t}|y_{1:n},\theta)$ for each $t$ , which are all Gaussian. However, for studying trends and other dynamic features in the system, we are interested in the joint distribution spanning the whole time range $p(x_{1:n}|y_{1:n},\theta)$ . Note that we are still conditioning on the unknown parameter vector $\theta$ and will account for it later. This high dimensional joint distribution is not easily accessible directly. As in many cases, instead of analytic expressions, it is more important to be able to draw realizations from the distribution and use the sampling distribution for statistical analysis. This has several benefits. One important is that by comparing simulated realizations to the observations, we see how realistic the model predictions are, which can reveal if the modelling assumptions are not valid. Also, we can study the distributions of model outputs directly from the samples and do not need to resolve to approximate statistics.

A simple simulation algorithm by Durbin and Koopman (2012) is the following. The state space system equations provide a direct way to recursively sample realizations of both the states $x_{1:n}$ and the observations $y_{1:n}$ , but the generated states will be independent of the original observations. However, it can be shown (Durbin and Koopman, 2012, Section 4.9) that the distribution of the residual process of generated against smoothed state does not depend on ${y}_{1:n}$ . This means that if we add simulated residuals over the original smoothed state $\widetilde{x}_{1:n}$ , we get a new realization that is conditional on the original observations $y_{1:n}$ . A procedure to sample a realization ${x}^{*}_{1:n}\sim p(x_{1:n}|y_{1:n},\theta)$ is thus:

Generate a sample using the state space system equations, Eqs. (3.1) and (3.2) to get $\check{x}_{1:n}$ and $\check{y}_{1:n}$ . 2. 2.

Smooth $\check{y}_{1:n}$ to get ${\breve{x}}_{1:n}$ according to formulas in Section 3.5. 3. 3.

Add the residuals from step 2 to the original smoothed state:

[TABLE]

This simulation smoother can be used in trend studies and as a part of more general MCMC simulation algorithm that will sample from the joint posterior distribution $p(x_{1:n},\theta|y_{1:n})$ and by marginalization argument also from $p(x_{1:n}|y_{1:n})$ where the uncertainty in $\theta$ has been integrated out (Laine et al, 2014).

3.7 Estimating the static structural parameters

In the first examples, the variance parameters defining the model error covariance matrix $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{t}$ were assumed to be known. In practice we need some estimation methodology for them. Basically there are three alternatives. The first one uses subject level knowledge with trial and error to fix the parameters without any algorithmic tuning. The second one use the marginal likelihood function with a numerical optimization routine to find the maximum likelihood estimate of the parameter $\theta$ and plug the estimate back to the equations and re-fit the DLM model. The third one use MCMC algorithm to sample from the posterior distribution of the parameters to estimate the parameters and to integrate out their uncertainty.

To estimate the free parameters $\theta$ in the model formulation by optimization or by MCMC we need the marginal likelihood function $p(y_{1:n}|\theta)$ . By the assumed Markov properties of the system, this can be obtained sequentially as a byproduct of the Kalman filter recursion (Särkkä, 2013),

[TABLE]

On the right hand side, the parameter $\theta$ will appear in the model predictions $\widehat{x}_{t}$ as they depend on the matrix $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle M} $}}{\mbox{\boldmath$ \mathrm{\textstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle M} $}}_{t}$ as well as on the model error $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{t}$ . For the same reason we need the determinant of the model prediction covariance matrix $|\widehat{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{y,t}|$ . A fortunate property is that this likelihood can be calculated along the DLM filter recursion without much extra effort.

The scaled one-step prediction residuals

[TABLE]

can be used to check the goodness of fit of the model. In order of the DLM model to be consistent with the observations these residuals should be approximately independent, $N(0,\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle I} $}}{\mbox{\boldmath$ \mathrm{\textstyle I} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle I} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle I} $}})$ Gaussian and without serial autocorrelation. Later in the GNSS time series example, we will do model diagnostics by residual quantile-quantile and autocorrelation function plots.

3.8 Analysing trends

In general terms, trend is a change in the distributional properties, such as in the mean, of the process generating the observations. We are typically interested in slowly varying changes in the background level, i.e. in the mean process after the known sources of variability, such as seasonality, has been accounted for. A common way to explore trends is to fit some kind of a smoother, such as a moving average, over the time series. However, many standard smoothing methods do not provide statistical estimates of the smoothness parameters or asses the uncertainty related to the level of smoothing.

In typical DLM trend analyses, a slowly varying (relative to the time scale we are interested in) background level of the system is modelled as a first or higher order random walk process with variance parameters that determine the time wise smoothness of the level. These variance parameters must be estimated and their uncertainty accounted for proper uncertainty quantification. In an optimal case, the data provides information on the smoothness of the trend component, but typically we need to use subject level prior information to decide the time scale of the changes we want to extract. In the GNSS application example in Section 3.10 we assume a global linear trend and the local non-stationary fluctuations are modelled using a local random walk model with autocorrelated residuals. A Bayesian DLM model offers means to provide qualitative prior information in the form of the model equations and quantitative information by prior distributions on the variance parameters, see e.g. Gamerman (2006).

For statistical analysis we need to estimate the full state as either $p(x_{1:n}|y_{1:n},\hat{\theta})$ , where we plug in some estimates of the auxiliary parameters $\hat{\theta}$ , (the maximum likelihood approach) or by $p(x_{1:n}|y_{1:n})=\int p(x_{1:n},\theta|y_{1:n})\,d\theta$ where the uncertainty of auxiliary parameter $\theta$ is integrated out. The latter is the Bayesian approach and calculations can be done, e.g., by Markov chain Monte Carlo (MCMC) simulation (Gamerman, 2006; Laine et al, 2014). A procedure to account the uncertainty in a DLM model and its structural parameters and to study DLM output will contain the following steps:

Formulate the DLM model and its marginal likelihood $p(y_{1:n}|\theta)$ by Kalman filter. 2. 2.

Use MCMC to sample from the posterior distribution $p(\theta|y_{1:n})$ with a suitable prior distribution $p(\theta)$ for the structural parameters and with the likelihood of step 1. 3. 3.

Generate a sample from the marginal posterior $p(x_{1:n}|y_{1:n})$ using the simulation smoother (Section 3.6) and for each sample use a different $\theta$ from the MCMC chain from the previous step. 4. 4.

For each state realization, $x^{*}_{1:n}\sim p(x_{1:n}|y_{1:n})$ , from step 3., calculate a trend related or any other statistics of interest and use this sample for the estimates and their uncertainties.

3.9 Examples of different DLM models

In the following, we give several useful DLM formulations for model components that are typically used in geodetic or in more general environmental analyses. They have been used in existing applications for stratospheric ozone (Laine et al, 2014), ionosonde analysis (Roininen et al, 2015) and for station temperature records (Mikkonen et al, 2015). In Section 3.10, we will show analysis for synthetic GNSS station positioning time series.

3.9.1 The effect of level and trend variance parameters

In the first example in Section 3.2.1 the variance $\sigma^{2}_{\mathrm{trend}}$ was assumed to be known and fixed. Altering the variance affects the smoothness of the fit. In Figure 3.2 the effect of different variance parameters are shown for the same data. Note that by setting both $\sigma^{2}_{\mathrm{level}}$ and $\sigma^{2}_{\mathrm{trend}}$ to zero results in classical linear regression without dynamical evolution of the regression components. In this case, the 95% probability limits for the level obtained from the smoother covariance matrix $\widetilde{\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle C} $}}{\mbox{\boldmath$ \mathrm{\textstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle C} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle C} $}}}_{t}$ coincide with the classical confidence intervals for the mean. In classical non-dynamic linear regression the modelling error is included in the residual term, whereas in DLM we can include it in the model definition by allowing temporal change in model parameters.

If we estimate the parameters by the likelihood approach and MCMC outlined in Section 3.7, we get the values in the last panel of Figure 3.2 corresponding to the posterior mean. Figure 3.3 shows MCMC chain histograms together with estimated marginal posterior densities. It also has the point values obtained by likelihood optimization. Note by optimization we get an estimate for $\sigma_{\mathrm{level}}$ which is very close to zero relatively to the MCMC solution, which tries to find all values of the parameter that are consistent with the data.

3.9.2 Seasonal component

Seasonal variability can be modelled by adding extra state components for the effect of each season. A common description of seasonality uses trigonometric functions and is achieved by using two model states for each harmonic component. Monthly data with annual and semiannual cycles would use four state components and the following model and observation matrices

[TABLE]

and

[TABLE]

In addition, a corresponding part or the model error covariance matrix $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{\mathrm{seas}}$ has to be set up to define the allowed variability in the seasonal amplitudes. A simple approach is to use a diagonal matrix with equal values for each component as $\mathrm{diag}(\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Q} $}}{\mbox{\boldmath$ \mathrm{\textstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Q} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Q} $}}_{\mathrm{seas}})=[\sigma^{2}_{\mathrm{seas}},\sigma^{2}_{\mathrm{seas}},\sigma^{2}_{\mathrm{seas}},\sigma^{2}_{\mathrm{seas}}]^{T}$ . If we set these variances to zero, the DLM algorithm will fit a temporally fixed seasonal amplitude.

For illustration we use a simulated monthly data with yearly variation that has some randomness in the amplitude. The observations have a piecewise linear trend similar to example in Section 3.2.1 and some values as missing to see the effect on the uncertainties. We fit a seasonal component with one harmonic function, but we allow some variability in the amplitude and trend, with $\sigma_{\mathrm{trend}}=0.005$ and $\sigma_{\mathrm{seas}}=0.4$ . Figure 3.4 shows the generated data together with both the fitted mean process and the fitted seasonal component. A similar example was also used in Roininen et al (2015).

3.9.3 Autoregressive process

Autoregressive processes have serial dependence between the observations. A general AR( $p$ ) process is defined by $p$ coefficients $[\rho_{1},\dots,\rho_{p}]$ and an independent innovation term $\epsilon$ as

[TABLE]

For including an autoregressive component into the state space formulation we need to use state variables that "remember" their previous values. This can be achieved by suitable evolution operator $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle M} $}}{\mbox{\boldmath$ \mathrm{\textstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle M} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle M} $}}_{\mathrm{AR}}$ . For example, AR(3) process with coefficients $[\rho_{1},\rho_{2},\rho_{3}]$ , will have three extra states with

[TABLE]

A pure AR(3) process would then be obtained by setting the observation error $\sigma^{2}_{\mathrm{obs}}$ in Eq. (3.1) to zero and the model error component equal to the innovation variance $\sigma^{2}_{\mathrm{AR}}$ . If we, in addition, have $\sigma^{2}_{\mathrm{obs}}>0$ , it will result to an ARMA. In fact all ARMA and ARIMA models can be represented as DLM models (Petris et al, 2009, Section 3.2.5) and many ARIMA estimation software implementations use the Kalman filter likelihood Eq. (3.23) to formulate the cost function for estimation.

3.9.4 Regression covariates and proxy variables

In many applications the variability in the observations is affected by some known external factors, such as temperature, air pressure or solar activity. Sometimes these variables can be measured directly, as for the temperature, and sometimes their effect is modelled via a proxy, such as a radio fluxes for the solar effect. As an example, assume an observations model

[TABLE]

where $\mu_{t}$ and $\gamma_{t}$ are the mean level and the seasonal components, $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle Z} $}}{\mbox{\boldmath$ \mathrm{\textstyle Z} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle Z} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle Z} $}}_{t}$ is a row matrix of the values of the regression variables at time $t$ , and $\beta_{t}$ is a vector of time-varying regression coefficients. The effect of the covariates can be formulated by having the coefficients as extra states, $x_{\mathrm{proxy},t}=\beta_{t}$ , using an identity model operator, and by adding the covariate values to the observation operator $\mathchoice{\mbox{\boldmath$ \mathrm{\displaystyle H} $}}{\mbox{\boldmath$ \mathrm{\textstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptstyle H} $}}{\mbox{\boldmath$ \mathrm{\scriptscriptstyle H} $}}_{t}$ as

[TABLE]

The DLM model for equation Eq. (3.29) is then build up as diagonal block matrix combination of the components:

[TABLE]

The covariate variances $\sigma^{2}_{\mathrm{proxy}}$ control the allowed temporal variability in the coefficients $\beta_{t}$ and their values can be estimated or set to some prior value. By setting the variances to zero, turns this model into classical multiple linear regression.

3.10 Synthetic GNSS example

Next we estimate trends in synthetic GNSS time series provided by Machiel Bos and Jean-Philippe Montillet. In this application, the trend estimated in the GNSS time series represents the tectonic rate on the East and North components and the vertical land motion on the Up coordinate. The characteristics of the GNSS time series are discussed in details in Chapter 1 and 2. We select data for one of the stations (labeled station n:o 3 in the figures) with the three components (East, North, Up) shown in Figure 3.5, top left panel. The time series are simulated using linear trend, yearly seasonal variation and a combination of coloured and i.i.d Gaussian noise. We assume that we do not know the noise structure a priori. We are interested in the (non-local) linear trend and we need a model component for the local fluctuations seen in the data. This chosen data sets does not contain any sudden jumps in the measured position. Modelling offset changes would require a different strategy, with some iterative estimate of the jump locations, which we will not consider here. We use a DLM approach, where we assume that the non-stationary part can be modelled by local polynomials and the stochastic stationary part can be described as an AR or ARMA process in addition to the i.i.d. Gaussian observation uncertainty. See Dmitrieva et al (2015) for a somewhat similar approach, which uses state space representation and Kalman filter likelihood to model flicker and random walk type noise in several stations at the same time.

So, in contrast to the spline smoothing example in Section 3.2.1, which had $\sigma^{2}_{\mathrm{level}}=0$ and $\sigma^{2}_{\mathrm{trend}}>0$ , we will extract a non-local linear trend, $\sigma^{2}_{\mathrm{trend}}=0$ , and model the local non-stationary fluctuations as a local level model with $\sigma^{2}_{\mathrm{level}}>0$ . In addition, we use a yearly seasonal component for the daily observations and an autoregressive AR(1) noise component to account for the possible residual correlation. The observation error is assumed Gaussian and to have known standard deviation, $\sigma_{\mathrm{obs}}=1\mathrm{mm}$ for components "East" and "North" and $\sigma_{\mathrm{obs}}=4\mathrm{mm}$ for the "Up" component. The AR(1) innovation variance $\sigma_{\mathrm{AR}}$ as well as the AR coefficient $\rho_{\mathrm{AR}}$ will be estimated from the data. We use Kalman filter likelihood to estimate the 2 variance parameters and the AR(1) coefficient by MCMC. We analyse the three components (East, North, Up) separately.

The true trend coefficients used in the simulation for the three data sets were give as 12.59, 17.64, and 2.778 mm/yr. The estimates obtained for them were $12.62\pm 0.61$ , $17.76\pm 0.69$ and $2.22\pm 1.00$ mm/yr, with one-sigma posterior standard deviations after $\pm$ . Table 3.2 shows the parameter estimates obtained by combination of Kalman simulation smoother for the linear slope and seasonal amplitude, and MCMC for $\theta=[\sigma_{\mathrm{level}},\sigma_{\mathrm{AR}},\rho_{\mathrm{AR}}]^{T}$ . Figures 3.5 and 3.6 visualise the results graphically. There is a hint of negative autocorrelation in the ACF plot for the East components in Figure 3.5, but otherwise the residuals, obtained from the scaled prediction residuals, equation Eq. (3.24), look very Gaussian. In overall, the selected DLM model seems to provide statistically consistent fit and reproduce the true trends within the estimated uncertainty.

3.11 Computer implementation

The examples and code to fit DLM models described here are available from a Github repository at https://github.com/mjlaine/dlm. The code is written in Matlab and it contains a reference implementation of Kalman filter, smoother and simulator algorithms as well as optimization and MCMC for the structural parameters. Other software implementations for DLM include state space models toolbox for Matlab described in Peng and Aston (2011), a R package dlm described in Petris et al (2009) and python implementations in the statsmodes package (Seabold and Perktold, 2010).

3.12 Conclusions

DLM provides a general framework for modelling many kinds of environmental time series, including geodetic ones. Some features of GNSS time series, such as the often assumed flicker noise and handling of offsets and data jumps might still require more special treatments. However, the DLM approach provides a very useful generalization to the ordinary linear regression model. Its strengths include the ability to model non-stationary processes by allowing temporal change in the model coefficients and the direct modelling of the processes that generate the observed variability. By guiding the analysis in terms of the generating processes and their uncertainties it provides a good basis for Bayesian statistical inference. If there is prior knowledge about the changes, such as known change points in the data, they can be included in the model. By using simulation based Bayesian DLM analysis, your prior and posterior model simulations can be checked to be consistent with physical constrains and the observations.

References

Cressie and Wikle (2011)

Cressie N, Wikle CK (2011) Statistics for Spatio-Temporal Data. Wiley

Dmitrieva et al (2015)

Dmitrieva K, Segall P, DeMets C (2015) Network-based estimation of time-dependent noise in gps position time series. Journal of Geodesy 89(6):591–606, DOI 10.1007/s00190-015-0801-9

Durbin and Koopman (2012)

Durbin T, Koopman S (2012) Time Series Analysis by State Space Methods, 2nd edn. Oxford Statistical Science Series, Oxford University Press, DOI 10.1093/acprof:oso/9780199641178.001.0001

Gamerman (2006)

Gamerman D (2006) Markov chain Monte Carlo – Stochastic simulation for Bayesian inference, 2nd edn. Chapman & Hall

Harvey (1991)

Harvey AC (1991) Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, DOI 10.1017/CBO9781107049994

Laine et al (2014)

Laine M, Latva-Pukkila N, Kyrölä E (2014) Analysing time-varying trends in stratospheric ozone time series using the state space approach. Atmospheric Chemistry and Physics 14(18):9707–9725, DOI 10.5194/acp-14-9707-2014

Mikkonen et al (2015)

Mikkonen S, Laine M, Mäkelä HM, Gregow H, Tuomenvirta H, Lahtinen M, Laaksonen A (2015) Trends in the average temperature in Finland, 1847–2013. Stochastic Environmental Research and Risk Assessment 29(6):1521–1529, DOI 10.1007/s00477-014-0992-2

Peng and Aston (2011)

Peng JY, Aston J (2011) The state space models toolbox for MATLAB. Journal of Statistical Software 41(6):1–26, DOI 10.18637/jss.v041.i06

Petris et al (2009)

Petris G, Petrone S, Campagnoli P (2009) Dynamic Linear Models with R. Use R!, Springer

Rodgers (2000)

Rodgers CD (2000) Inverse Methods for Atmospheric Sounding: Theory and Practice. World Scientific

Roininen et al (2015)

Roininen L, Laine M, Ulich T (2015) Time-varying ionosonde trend: Case study of Sodankylä hmF2 data 1957–2014. Journal of Geophysical Research: Space Physics 120(8):6851–6859, DOI 10.1002/2015JA021176

Särkkä (2013)

Särkkä S (2013) Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks, Cambridge University Press

Seabold and Perktold (2010)

Seabold S, Perktold J (2010) Statsmodels: Econometric and statistical modeling with python. In: Proceedings of the 9th Python in Science Conference

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Cressie and Wikle (2011) Cressie N, Wikle CK (2011) Statistics for Spatio-Temporal Data. Wiley
2Dmitrieva et al (2015) Dmitrieva K, Segall P, De Mets C (2015) Network-based estimation of time-dependent noise in gps position time series. Journal of Geodesy 89(6):591–606, DOI 10.1007/s 00190-015-0801-9
3Durbin and Koopman (2012) Durbin T, Koopman S (2012) Time Series Analysis by State Space Methods, 2nd edn. Oxford Statistical Science Series, Oxford University Press, DOI 10.1093/acprof:oso/9780199641178.001.0001
4Gamerman (2006) Gamerman D (2006) Markov chain Monte Carlo – Stochastic simulation for Bayesian inference, 2nd edn. Chapman & Hall
5Harvey (1991) Harvey AC (1991) Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, DOI 10.1017/CBO 9781107049994
6Laine et al (2014) Laine M, Latva-Pukkila N, Kyrölä E (2014) Analysing time-varying trends in stratospheric ozone time series using the state space approach. Atmospheric Chemistry and Physics 14(18):9707–9725, DOI 10.5194/acp-14-9707-2014
7Mikkonen et al (2015) Mikkonen S, Laine M, Mäkelä HM, Gregow H, Tuomenvirta H, Lahtinen M, Laaksonen A (2015) Trends in the average temperature in Finland, 1847–2013. Stochastic Environmental Research and Risk Assessment 29(6):1521–1529, DOI 10.1007/s 00477-014-0992-2
8Peng and Aston (2011) Peng JY, Aston J (2011) The state space models toolbox for MATLAB. Journal of Statistical Software 41(6):1–26, DOI 10.18637/jss.v 041.i 06