Identification of Markov Jump Autoregressive Processes from Large Noisy   Data Sets

Sarah Hojjatinia; Constantino M. Lagoa

arXiv:1903.11058·eess.SP·March 28, 2019

Identification of Markov Jump Autoregressive Processes from Large Noisy Data Sets

Sarah Hojjatinia, Constantino M. Lagoa

PDF

TL;DR

This paper presents a new method for identifying Markov jump autoregressive models from large, noisy datasets, accurately estimating system dynamics, switching behavior, and noise parameters even with high noise levels.

Contribution

It introduces a novel identification approach that handles large measurement noise and Markov switching, improving accuracy in complex noisy environments.

Findings

01

Effective even with high noise-to-output ratios

02

Performs well with large datasets

03

Accurately estimates switching dynamics and noise parameters

Abstract

This paper introduces a novel methodology for the identification of switching dynamics for switched autoregressive linear models. Switching behavior is assumed to follow a Markov model. The system's outputs are contaminated by possibly large values of measurement noise. Although the procedure provided can handle other noise distributions, for simplicity, it is assumed that the distribution is Normal with unknown variance. Given noisy input-output data, we aim at identifying switched system coefficients, parameters of the noise distribution, dynamics of switching and probability transition matrix of Markovian model. System dynamics are estimated using previous results which exploit algebraic constraints that system trajectories have to satisfy. Switching dynamics are computed with solving a maximum likelihood estimation problem. The efficiency of proposed approach is shown with several…

Figures1

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Identifying probability transition matrix (PTM) for different value of noise variance and different system run.

experiment	True	Estimated	Normalized
#	PTM	PTM	Ferobenius norm	$γ$	$σ^{2}$
1	$(\begin{matrix} 0.1837 & 0.8163 \\ 0.3424 & 0.6576 \end{matrix})$	$(\begin{matrix} 0.2116 & 0.7884 \\ 0.3472 & 0.6528 \end{matrix})$	0.035810	0.0888	0.01
2	$(\begin{matrix} 0.4286 & 0.5714 \\ 0.1412 & 0.8588 \end{matrix})$	$(\begin{matrix} 0.3897 & 0.6103 \\ 0.1776 & 0.8224 \end{matrix})$	0.066922	0.1439	0.03
3	$(\begin{matrix} 0.1748 & 0.8252 \\ 0.4921 & 0.5079 \end{matrix})$	$(\begin{matrix} 0.2439 & 0.7561 \\ 0.4802 & 0.5198 \end{matrix})$	0.090075	0.1967	0.05
4	$(\begin{matrix} 0.5056 & 0.4944 \\ 0.6885 & 0.3115 \end{matrix})$	$(\begin{matrix} 0.5087 & 0.4913 \\ 0.6412 & 0.3588 \end{matrix})$	0.064687	0.2273	0.07
5	$(\begin{matrix} 0.2587 & 0.7413 \\ 0.4536 & 0.5464 \end{matrix})$	$(\begin{matrix} 0.3200 & 0.6800 \\ 0.4474 & 0.5526 \end{matrix})$	0.082236	0.2650	0.09
6	$(\begin{matrix} 0.3991 & 0.6009 \\ 0.1811 & 0.8189 \end{matrix})$	$(\begin{matrix} 0.3661 & 0.6339 \\ 0.2651 & 0.7349 \end{matrix})$	0.115335	0.5223	0.27
7	$(\begin{matrix} 0.5350 & 0.4650 \\ 0.6467 & 0.3533 \end{matrix})$	$(\begin{matrix} 0.5180 & 0.4819 \\ 0.5788 & 0.4212 \end{matrix})$	0.096751	0.4603	0.29

Equations87

x_{k} = j = 1 \sum n_{a} a_{j δ_{k}} x_{k - j} + j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

x_{k} = j = 1 \sum n_{a} a_{j δ_{k}} x_{k - j} + j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

P_{ij} = P {δ_{k + 1} = j \leavevmode ∣ \leavevmode δ_{k} = i}

P_{ij} = P {δ_{k + 1} = j \leavevmode ∣ \leavevmode δ_{k} = i}

y_{k} = x_{k} + η_{k}

y_{k} = x_{k} + η_{k}

b_{δ_{k}}^{T} r_{k} = 0

b_{δ_{k}}^{T} r_{k} = 0

r_{k} = [x_{k}, \leavevmode x_{k - 1}, \leavevmode \dots, \leavevmode x_{k - n_{a}}, \leavevmode u_{k - 1}, \leavevmode \dots, \leavevmode u_{k - n_{c}}]^{T}

r_{k} = [x_{k}, \leavevmode x_{k - 1}, \leavevmode \dots, \leavevmode x_{k - n_{a}}, \leavevmode u_{k - 1}, \leavevmode \dots, \leavevmode u_{k - n_{c}}]^{T}

b_{δ_{k}} = \leavevmode [- 1, \leavevmode a_{1 δ_{k}}, \leavevmode \dots, \leavevmode a_{n_{a} δ_{k}}, \leavevmode c_{1 δ_{k}}, \leavevmode \dots, \leavevmode c_{n_{c} δ_{k}}]^{T} .

b_{δ_{k}} = \leavevmode [- 1, \leavevmode a_{1 δ_{k}}, \leavevmode \dots, \leavevmode a_{n_{a} δ_{k}}, \leavevmode c_{1 δ_{k}}, \leavevmode \dots, \leavevmode c_{n_{c} δ_{k}}]^{T} .

Υ_{n} (r_{k}) = i = 1 \prod n b_{i}^{T} r_{k} = c_{n}^{T} ν_{n} (r_{k}) = 0,

Υ_{n} (r_{k}) = i = 1 \prod n b_{i}^{T} r_{k} = c_{n}^{T} ν_{n} (r_{k}) = 0,

ν_{n} ([x_{1}, \leavevmode \dots, \leavevmode x_{s}]^{T}) = [\dots, \leavevmode x_{1}^{n_{1}} x_{2}^{n_{2}} \dots x_{s}^{n_{s}}, \leavevmode \dots]^{T}

ν_{n} ([x_{1}, \leavevmode \dots, \leavevmode x_{s}]^{T}) = [\dots, \leavevmode x_{1}^{n_{1}} x_{2}^{n_{2}} \dots x_{s}^{n_{s}}, \leavevmode \dots]^{T}

\overline{M}_{N} = \frac{1}{N} k = 1 \sum N ν_{n} (r_{k}) ν_{n}^{T} (r_{k}) ≐ \frac{1}{N} k = 1 \sum N M_{k}

\overline{M}_{N} = \frac{1}{N} k = 1 \sum N ν_{n} (r_{k}) ν_{n}^{T} (r_{k}) ≐ \frac{1}{N} k = 1 \sum N M_{k}

E [x_{k}^{h}]

E [x_{k}^{h}]

= E [y_{k}^{h}] - d = 1 \sum h (d h) E [x_{k}^{h - d}] m_{d}

\forall k = 1, 2, \dots, N .

M_{k}

M_{k}

= M {E [m o n_{n} (y_{k}, \dots, y_{k - n_{a}})]} .

\overline{M}_{N} ≐ \frac{1}{N} k = 1 \sum N M [m o n_{n} (y_{k}, \dots, y_{k - n_{a}})]

\overline{M}_{N} ≐ \frac{1}{N} k = 1 \sum N M [m o n_{n} (y_{k}, \dots, y_{k - n_{a}})]

x_{k} = j = 1 \sum n_{a} a_{j δ_{k}} x_{k - j} + j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

x_{k} = j = 1 \sum n_{a} a_{j δ_{k}} x_{k - j} + j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

y_{k} = x_{k} + η_{k}

η_{k} -

η_{k} -

y_{k} - j = 1 \sum n_{a} a_{j δ_{k}} y_{k - j} - j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

z_{k} (δ_{k}) = y_{k} - j = 1 \sum n_{a} a_{j δ_{k}} y_{k - j} - j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

z_{k} (δ_{k}) = y_{k} - j = 1 \sum n_{a} a_{j δ_{k}} y_{k - j} - j = 1 \sum n_{c} c_{j δ_{k}} u_{k - j}

Z_{k} =

Z_{k} =

\displaystyle\forall\leavevmode\nobreak\ k=(n_{a}+1)+(n_{a}+n_{l})\times l,\leavevmode\nobreak\ \leavevmode\nobreak\

\forall \leavevmode l = 0, 1, 2, \dots, int (\frac{N}{n _{a} + n _{l}}) - (n_{a} + n_{l} + 1)

f (x) = \frac{1}{( 2 π ) ^{n_{l} /2} ∣Σ ∣ ^{1/2}} exp {- \frac{1}{2} (x)^{T} ∣Σ ∣^{- 1} (x)}

f (x) = \frac{1}{( 2 π ) ^{n_{l} /2} ∣Σ ∣ ^{1/2}} exp {- \frac{1}{2} (x)^{T} ∣Σ ∣^{- 1} (x)}

\displaystyle\max_{Z_{k},\delta_{k}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \sum_{k}\log[f(Z_{k})]\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\

\displaystyle\max_{Z_{k},\delta_{k}}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \sum_{k}\log[f(Z_{k})]\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\

s.t. \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode Z_{k} = [z_{k} (δ_{k}), \dots, z_{k + n_{l} - 1} (δ_{k + n_{l} - 1})]^{T}

\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \delta_{k}\in\big{\{}1,\cdots,\,n\big{\}}

\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \forall\leavevmode\nobreak\ k=(n_{a}+1)+(n_{a}+n_{l})\times l,\leavevmode\nobreak\ \leavevmode\nobreak\

\leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode \forall \leavevmode l = 0, 1, 2, \dots, int (\frac{N}{n _{a} + n _{l}}) - (n_{a} + n_{l} + 1)

n_{ij} = k \sum n_{ij}^{(k)}

n_{ij} = k \sum n_{ij}^{(k)}

i = 1, j = 1 \prod n P_{ij}^{n_{ij}}

i = 1, j = 1 \prod n P_{ij}^{n_{ij}}

j = 1 \sum n P_{ij} = 1, i = 1, \dots, n

P_{ij} \geq 0, i = 1, \dots, n j = 1, \dots, n

i = 1, j = 1 \sum n n_{ij} lo g (P_{ij})

i = 1, j = 1 \sum n n_{ij} lo g (P_{ij})

j = 1 \sum n P_{ij} = 1 i = 1, \dots, n

P_{ij} \geq 0 i = 1, \dots, n j = 1, \dots, n

N \to \infty lim \leavevmode \leavevmode σ^{2} \to 0 lim \leavevmode \leavevmode ∣∣ \hat{P}_{N} - P_{t r u e} ∣∣ \leavevmode \to 0

N \to \infty lim \leavevmode \leavevmode σ^{2} \to 0 lim \leavevmode \leavevmode ∣∣ \hat{P}_{N} - P_{t r u e} ∣∣ \leavevmode \to 0

subsystem 1 : \leavevmode \leavevmode

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Identification of Markov Jump Autoregressive Processes

from Large Noisy Data Sets

Sarah Hojjatinia1, Constantino M. Lagoa2 1Sarah Hojjatinia is with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, USA, [email protected]2Constantino M. Lagoa is with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, USA, [email protected]

This work was partially supported by National Institutes of Health (NIH) Grant R01 HL142732, and National Science Foundation (NSF) Grant #1808266.

Abstract

This paper introduces a novel methodology for the identification of switching dynamics for switched autoregressive linear models. Switching behavior is assumed to follow a Markov model. The system’s outputs are contaminated by possibly large values of measurement noise. Although the procedure provided can handle other noise distributions, for simplicity, it is assumed that the distribution is Normal with unknown variance. Given noisy input-output data, we aim at identifying switched system coefficients, parameters of the noise distribution, dynamics of switching and probability transition matrix of Markovian model. System dynamics are estimated using previous results which exploit algebraic constraints that system trajectories have to satisfy. Switching dynamics are computed with solving a maximum likelihood estimation problem. The efficiency of proposed approach is shown with several academic examples. Although the noise to output ratio can be high, the method is shown to be extremely effective in the situations where a large number of measurements is available.

1 Introduction

While identification of linear time invariant systems is by now a well understood problem, identification of switched and hybrid systems is considerably less developed, even in the piecewise affine case. Existing methods exploit a number of algebraic, optimization-based techniques to find subsystem dynamics and switching surfaces [10]. A common feature is the computational complexity entailed in dealing with noisy measurements: in this case algebraic procedures lead to nonconvex optimization problems, while optimization methods lead to mixed integer/linear programming [11].

Similarly, methods relying on probabilistic priors [5] lead to combinatorial problems. This can be avoided by using clustering-based methods [7]. However, these require “fair sampling” of each cluster, which constrains the data that can be used. In [9, 8], some sparsification based-techniques for identification of affine switched models have been developed that allow for several types of noise.

This paper develops effective methods for identifying switching dynamics from large noisy data sets, for a broad class of systems described by switching autoregressive models. These systems can be considered a generalization of piecewise affine models, and breach the gap between linear and nonlinear models, retaining many of the tractability properties of the former, while providing descriptions that more accurately capture the features of practical problems over broader scenarios.

Development of the proposed framework is motivated by health care applications: Specifically, smartphone-based interventions for increasing light physical activity [6]. More precisely, the availability of activity tracking devices allows gathering of large amount of data such as physical activity of an individual. Physical activity is a dynamic behavior, so it can be modeled as a dynamical system. Furthermore, its characteristics may remarkably change based on the time in a day, weekdays or weekends, location, etc; which, motivated the approach of modeling it as a switching system [1].

In identifying the parameters of switched models, the dynamics of switching play an important role. The interest in Markovian jump systems, switched system with switching dynamics based on a Markov chain, has been growing since they have a broad range of application in different areas and real world problems such as economic systems, power systems, and networked control systems. In comparison to the large amount of literature on analysis and control of Markovian jump systems, the identification problem seems to have received very little attention.

In [3] a new method for the identification of parameters of Markovian jump system is provided. The probability transition matrix is estimated using a suitable convex optimization problem. However, due to computational complexity, the number of measurement that the proposed approach is able to handle is limited and only process noise was considered. In this paper, we focus on cases involving a very large number of measurements, possibly affected by large values of noise. In this case, polynomial/moments based approaches become ineffective, and different methodologies need to be devised. The approach we propose builds upon the same premises as [4].

More precisely we start by assuming that the output measurements are corrupted by random Normal measurement noise with unknown variance. Then, we exploit the availability of a large number of measurements and the results in [4] to determine high confidence estimates of the systems parameters and the variance of the measurement noise. Finally, by using a maximum likelihood approach, we estimate the probability transition matrix and dynamics of switching. The approach can be easily extended to other noise distributions, as long as the number of unknown parameters of the distribution is ”low.”

1.1 Paper Organization

The paper is structured as follows: after this introduction, problem statement is defined in Section 2. Identification of system coefficients and noise parameters are reviewed in Section 3. In Section 4, the method for identification of probability transition matrix is described. Numerical results are shown in Section 5. Finally, Section 6 concludes the paper highlighting some possible future research directions.

2 Problem Statement

A precise description of the problem addressed is provided in this section. Assumptions needed to solve the problem are also introduced.

2.1 System Model

We consider switched autoregressive (SAR) linear models of the form

[TABLE]

where $x_{k}\in\mathbb{R}$ is the output at time $k$ and $u_{k}\in\mathbb{R}$ is input at time $k$ . The variable $\delta_{k}\in\{1,...,n\}$ denotes the sub-system active at time $k$ , where $n$ is the total number of sub-systems. Furthermore, $a_{j\delta_{k}}$ and $c_{j\delta_{k}}$ denote unknown coefficients corresponding to mode $\delta_{k}$ . Time $k$ takes values over the non-negative integers. The latent discrete state $\delta_{k}$ evolves according to a Markov chain with transition probability matrix $P$ , whose $ij$ entry is

[TABLE]

Output is assumed to be contaminated by (possibly large) noise; i.e. observations are of the form:

[TABLE]

where $\eta_{k}$ , denotes measurement noise.

The following assumptions are made on the system model and noise.

Assumption 1

Throughout this paper it is assumed that:

•

Upper bounds on $n_{a}$ and $n_{c}$ are available.

•

Upper bound on the number of sub-systems $n$ is available.

•

Measurement noise $\eta_{k}$ has zero mean Normal distribution with unknown variance.

•

Noise $\eta_{k}$ is independent from $\eta_{l}$ for $k\neq l$ , and identically distributed.

•

Input sequence $u_{k}$ applied to the system is known and bounded.

•

There exists a finite constant $L$ so that $|x_{k}|\leq L$ for all positive integers $k$ .

•

Switching sequence is based on a Markov process.

Note that, the approach in this paper can be extended for any noise distribution as long as the number of unknown distribution parameters is “small.”

2.2 Problem Definition

The main objective of this paper is to develop algorithms to identify the parameters of SAR systems, noise parameters, and dynamics of switching from noisy observations. More precisely, we aim at solving the following problem:

Problem 1

Given Assumption 1, an input sequence $u_{k}$ , $k=-n_{c}+1,\dots,N-1$ and noisy output measurements $y_{k}$ , $k=-n_{a}+1,\dots,N$ , determine

Coefficients of the SAR model $a_{i,j}$ , $i=1,2,\ldots,n_{a}$ , $j=1,2,\ldots,n$ , $c_{i,j}$ , $i=1,2,\ldots,n_{c}$ , $j=1,2,\ldots,n$ , 2. 2.

Noise distribution parameters, 3. 3.

Switching sequence $\delta_{k}$ , $k=1,2,\ldots,n$ which is based on a Markov process.

3 Review: Identification of System Coefficients and Noise Parameters

To identify the coefficients of SAR system with measurement noise from large amount of data, we adopt the approach developed in [4]. For the sake of completeness, we briefly summarize the approach in this section. We refer the reader to [4] for more details.

First, we review earlier results on an algebraic reformulation of the SAR identification problem for the case where no noise is present. Details on the algebraic approach to switched system identification can be found in [12].

The equation (1) is equivalent to

[TABLE]

where

[TABLE]

is the known regressor at time $k$ , and

[TABLE]

is the vector of unknown coefficients at time $k$ . Hence, independently of which of the $n$ submodels is active at time $k$ , we have

[TABLE]

where the vector of parameters corresponding to the $i$ -th submodel is denoted by $b_{i}\in\mathbb{R}^{n_{a}+n_{c}+1}$ , and $\nu_{n}(\cdot)$ is Veronese map of degree $n$ [2]

[TABLE]

which contains all monomials of order $n$ in lexicographical order, and $c_{n}$ is a vector whose entries are polynomial functions of unknown parameters $b_{i}$ (see [13] for explicit definition). The Veronese map above is also known as polynomial embedding in machine learning [13].

Note that the number of rows of the Veronese matrix $V_{n}$ is equal to the number of measurements available for the regressor $N$ . Therefore, a reformulation of the previous results to address the problem of identification from very large data sets is as follows [4].

For the noiseless case, a reformulation of the hybrid decoupling constraint shows identifying the coefficients of the sub-models is equivalent to finding the singular vector $c_{n}$ associated with the minimum singular value of the matrix

[TABLE]

where, matrices are of size $\binom{n+n_{a}+n_{c}}{n}$ , and size does not depend on the number of measurements, which is especially important in the case of very large data sets.

Identifying the parameters of the SAR model is equivalent to finding a vector in the null space of the matrix $\overline{\mathcal{M}}_{N}$ . Under mild conditions, the null space of the matrix above has dimension one if and only if the data is compatible with the assumed model. However, when output is corrupted by noise, $x_{k}$ is not known and, therefore, this matrix cannot be computed. However, we can use available information on the statistics of the noise and the measurements collected to compute approximations of the matrix $\overline{\mathcal{M}}_{N}$ and, consequently, approximations of vectors in its null space.

We start by noting that although $x_{k}$ are unknown, the following holds

[TABLE]

where $E(\cdot)$ denotes expectation and $m_{d}$ is the $d^{th}$ moment of noise.

Hence, assuming that distribution of the noise, and the input signal are given and fixed, there exists an affine function $M(\cdot)$ so that

[TABLE]

where $mon_{n}(\cdot)$ denote a function that returns a vector with all monomials up to order $n$ of its argument.

This can be exploited to identify the parameters of the SAR system. The only thing needed is an estimation of the matrix $\overline{\mathcal{M}}_{N}$ in (6). It turns out that this can be done using the available noisy measurements. More precisely, we can construct the matrix

[TABLE]

and it is shown in [4] that this matrix converges to $\overline{\mathcal{M}}_{N}$ in (6) as $N\rightarrow\infty$ almost surely. Hence, for large number of measurements $N$ , the null space of the matrix $\widehat{\overline{\mathcal{M}}}_{N}$ can be used to determine the coefficients of the subsystems.

The above assumes knowledge of the moments of the noise. However, this does not need to be the case. For simplicity of exposition, assume that measurement noise has a Normal distribution with zero mean and unknown variance $\sigma^{2}$ . Then, since the moments are known functions of the variance, $\overline{\mathcal{M}}_{N}$ is a known function of $\sigma$ and estimation of variance can be performed by minimizing the minimum singular value of matrix above over the allowable values of $\sigma$ . More precisely, the parameters of the submodels and the variance of the noise can be identified using the following algorithm: Let $n_{a}$ , $n_{c}$ , $n$ , some parameters of the noise and $\sigma_{\max}$ be given.

Step 1.

Compute matrix $\widehat{\overline{\mathcal{M}}}_{N}$ as a function of the unknown noise parameter $\sigma$ . 2. Step 2.

Find the value $\sigma^{*}\in[0,\leavevmode\nobreak\ \sigma_{\max}]$ that minimizes the minimum singular value of $\widehat{\overline{\mathcal{M}}}_{N}$ . 3. Step 3.

Let $c_{n}$ be associated singular vector. 4. Step 4.

Determine the coefficients of the subsystems from the vector $c_{n}$ .

In order to perform Step 3 in Algorithm, we adopt polynomial differention algorithm for mixtures of hyperplanes, introduced by Vidal [14, pp. 69–70]. In practice for sufficiently large $N$ , the above algorithm provides both a good estimate of the systems coefficients and noise parameters, especially if we take $\sigma^{*}$ to be the smallest value of $\sigma$ for which the minimum singular value of $\widehat{\overline{\mathcal{M}}}_{N}$ is below a given threshold $\epsilon.$ Previous work cannot address the identification of switching dynamics and estimating the probability transition matrix of Markov jump models, this problem is explicitly addressed in the following section.

4 Identification of Probability Transition Matrix

In Section 3, the algorithms and procedure of identifying noise parameters and system coefficients have been presented. In this section the switching behavior and dynamics of switching are considered. This is done in two steps: The first step is to identify switches that have the highest probability of occurrence. Then, in the second step, by considering these switches as a good estimate of switching sequence, we estimate the transition probabilities.

4.1 Maximum likelihood switch sequence

Assume that the noise variance and system coefficients have been identified. To do the first step in identification of switching dynamics, i.e., determine the switches with highest probability, we start by building the following sequence based on available data and identified coefficients and parameters. Considering equations (1) and (3):

[TABLE]

Since $x_{k}=y_{k}-\eta_{k}$ . we have

[TABLE]

Since we have identified the coefficients ${a_{j\delta_{k}}}$ and ${c_{j\delta_{k}}}$ , and input output ( $u$ , $y$ ) are available, we are able to determine the realization of the random variable in the right hand side of equation (8) for the all possible values of the switching sequenc $\delta_{k}$ . Define

[TABLE]

To identify the most probable realization of the switches, we can use the values of $z_{k}(\delta_{k})$ for each possible active system at time $k$ ( $\delta_{k}=\{1,\cdots,n\}$ ), and determine the sequence $\delta_{k}$ , $k=1,2,\ldots,N$ of maximum likelihood. However, for a fixed switching sequence, $z_{k}(\delta_{k})$ is a sequence of correlated random variables. Even though the measurement noise is iid, $z_{k}(\delta_{k})$ depends on $\eta_{k-l}\,$ , $l=0,1,\ldots,n_{a}$ leading to a correlated sequence of random variables. Therefore, determining the values of $\delta_{k}\,$ , $k=1,2,\ldots,N$ that lead to the highest likelihood is a complex combinatorial problem.

To circumvent this, we start by noting that $z_{k}(\delta_{k})$ is independent of $z_{l}(\delta_{l})$ if $l>k+n_{a}$ . Therefore, if enough data is available, we can use only independent “snippets” of data of low enough length for which: i) maximum likelihood sequence can be easily computed and ii) given that they are independent, likelihood can be computed individually for each snippet.

Hence, in the identification procedure proposed in this paper, we only consider snippets of data of the length $n_{l}$ that are separated in time by at least $n_{a}$ sample periods. More precisely, we consider snippets of data of length $n_{l}$ , compute its joint distribution as a function of the switching sequence, determine the maximum likelihood switches for this snippet, skip the next $n_{a}$ data points, and repeat the process until we run out of data.

We now elaborate on this. Take snippets of data of length $n_{l}$ , denoted as a vector $Z_{k}$ defined as:

[TABLE]

where int $(\cdot)$ refers to integer part (round towards zero) of its argument. In this way, each snippet $Z_{k}$ is independent from other snippets and each of these has a mutltivariable Normal distribution whose covariance matrix is a function of the switching sequence. As a reminder, an $n_{l}$ dimension multivariate Normal distribution has density function

[TABLE]

where $\Sigma$ is the covariance matrix of dimension $n_{l}\times n_{l}$ .

Note that, at each time $k$ , there are $n^{n_{l}}$ possible switching sequences for the snippet $Z_{k}$ , since $\delta_{k}\in\big{\{}1,\cdots,\,n\big{\}}.$ Therefore, if $n_{l}$ is small enough, we can compute the likelihood value for each of the $n^{n_{l}}$ choices and take the most likely sequence of subsystems as the one leading to the highest likelihood. Hence, given the independence of $Z_{k}$ , estimating the most likely switching sequence in the used snippets can be done by solving the following problem

[TABLE]

whose optimization can be done separately for each term of the sum. Therefore, its complexity is exponential in $n_{l}$ but linear in the number of snippets and can be efficiently solved if $n_{l}$ is “not too large.”

Remark 1

As previously mentioned, in the above formulation, we do not use all available data when computing high likelihood switchings. More precisely, we only use $n_{l}/(n_{l}+n_{a})$ of the data. Hence, any choice of $n_{l}$ is a compromise between computational complexity and fraction of the data used and the “right” choice should be done by taking into account how many measurements are available.

Solving Problem (11), allows us to determine how many times a specific ”jump” occurs in this high likelihood sequence of switches. This can be done in the following way:

Step 1.

Solve problem (11). Recall that this can be done by solving the problem separately for each $k$ . 2. Step 2.

For each $k$ , let $n^{(k)}_{ij}$ be the number of times the transition from system $i$ to system $j$ occurs in the maximum likelihood switch sequence for snippet $k$ . 3. Step 3.

Compute the total number of transitions from system $i$ to system $j$ observed in all snippets

[TABLE]

Given this high likelihood estimate of how often a transition occurs in the snippets, we can estimate the probability transition matrix. This can be done by solving the “traditional” maximum likelihood problem for Markov chains

[TABLE]

or equivalently, solve the equivalent convex optimization problem

[TABLE]

As the number of observations goes to infinity, the solution of this problem converges to the true probability transition matrix. More precisely, we have the following result

Theorem 1

Assume that the Markov model for switching is aperiodic and let $\hat{P}_{N}$ and $P_{true}$ be the estimated and true transition probabilities respectively. Then,

[TABLE]

where $\sigma^{2}$ is the variance of the measurement noise.

Sketch of proof: From the results in [4], we know the estimates of parameters of the system converge to the true parameters as $\sigma^{2}\to 0$ for large enough finite $N$ . Hence, under mild assumptions on the subsystems, we can identify the true transitions within snippets. The fact that the Markov chain is not periodic, implies that, in an infinite sequence, transitions will occur with equal probability in each snippet. This implies that, as $N\to\infty$ relative frequency of the transitions converges to the true transition probabilities.

4.2 An Example

To better illustrate the approach, we provide an example of how to do the maximum likelihood estimation of probabilities required for identification of probability transition matrix. Therefore, consider the problem of identifying switching dynamics for SAR system with $n=2$ subsystems of the form

[TABLE]

and noisy measurements

[TABLE]

where $\eta_{k}$ has zero mean Normal distribution, and $n_{a}=1$ . We consider snippets of data of length $n_{l}=2$ , and skip $n_{a}=1$ sample measurement in between the snippets of data, i.e. two snippets of data for subsystem 1 are like:

[TABLE]

and we have skipped this one:

$\eta_{k+2}-a_{1}\eta_{k+1}=y_{k+2}-(a_{1}y_{k+1}+b_{1}u_{k+1})$ For this example

[TABLE]

and,

[TABLE]

Therefore, the set of possible active sequences can be:

[TABLE]

At each time $k$ , there are $n^{n_{l}}=4$ possible $Z_{k}$ cases. For system shown in equation (15), $Z_{k}$ cases are as follows:

[TABLE]

The multivariate Normal Probability at each of $n^{n_{l}}$ sequences $Z_{k}$ will be computed, and the one which has the maximum value of the likelihood will be considered as the set of active subsystems at that point.

5 Numerical Results

In this section, we will address the problem of identifying switching dynamics in Markovian jump systems. The values of true coefficients in this example has taken from the example in [4], which are $a_{1}=0.3,\leavevmode\nobreak\ b_{1}=1,\leavevmode\nobreak\ a_{2}=-0.5,$ and $b_{2}=-1$ . Measurement noise is assumed to have zero-mean Normal distribution. So, AR system with $n=2$ subsystems, $n_{a}=1$ , and $n_{c}=1$ in this example are as follows:

[TABLE]

Total number of $N=10^{6}$ input-output data is available. Output is corrupted with random measurement noise which is Normal with zero mean and different values of variance. The proposed algorithm is coded and run in Python.

Noise to output ratio ( $\gamma$ ) is defined as

[TABLE]

Simulation results for several experiments are shown in Table 1. For each experiment, a random probability transition matrix has been generated, which is shown in column 2 of the table. By using the algorithms mentioned in the paper, for each experiment probability transition matrix has been estimated from noisy measurements, which is shown in column 3 of the table. The normalized Frobenius norm between true and estimated values of probability transition matrix has been computed and shown in column 4 of the table $({\left\lVert\hat{P}-P_{true}\right\rVert_{F}}\,/\,{\left\lVert P_{true}\right\rVert_{F}})$ . For each experiment noise to output ratio and variance of noise are shown in columns 5 and 6 of the table. As we see in this table, the value of entries in probability transition matrix are very close to the true values, even when the noise variance is high with noise magnitude in average around 30% of the signal magnitude.

For example, in experiment 6 the value of $\gamma=0.5223$ shows that noise to output ratio of approximately $52\%$ ; even with this very large value of corruption with measurement noise, the proposed method works well and the normalized Frobenius norm between true and estimated values of probability transition matrix is only 0.1153. As expected and in the Table 1 is shown, for smaller values of noise to output ratio ( $\gamma$ ), the estimated values for probability transition matrix are closer to the true probability transition matrix and normalized Frobenius norm of their difference has smaller values. However, with the prposed approach even for large values of noise to output ratio, the difference in Frobenius norm is still low.

Figure 1 demonstrates the convergence of probability transition matrix as number of measurements grows. This figure is based on a random experiment, for the values of coefficients in equation (18), fixed variance of noise $\sigma^{2}=0.03$ and noise to output ratio $\gamma=0.15$ . As we see in this figure the values of normalized Frobenius norm between true and estimated probability transition matrix decreases, when number of measurement increases. As we observe in Figure 1, the value of difference between true and estimated probability transition matrix decreases from $0.2419$ at $k=100$ to $0.0659$ at $k=10^{6}$ . It shows even for the case of having $15\%$ noise to output ratio, the approximated switching dynamics and transition probability matrix are close to the true ones.

6 conclusion and future work

In this paper we have proposed a methodology for identification of switching dynamics in Markovian jump SAR models. Given large noisy input-output data, by using previously developed procedures for identification of switched system from large noisy data sets, we estimate the parameters of the noise, and then, identify the coefficients of each submodel. Then, by using the novel procedure presented in this paper for estimation of probability transition matrix, we identify the switching dynamics and computed the probability transition matrix of Markov chain. Even for large values of measurement noise, numerical simulations show a low estimation error. The Frobenius norm between estimated and true probability transition matrix is small even in the case of large noise to output ratio. For future work, we can consider the problem of identifying switched ARX models and switching dynamics form large noisy data sets, but with the process noise. We will also test the effectiveness of the proposed approaches in “real” applications with emphasis on estimating individual response to treatments aimed at improving light physical activity.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] David E Conroy, Sarah Hojjatinia, Constantino M Lagoa, Chih-Hsiang Yang, Stephanie T Lanza, and Joshua M Smyth. Personalized models of physical activity responses to text message micro-interventions: A proof-of-concept application of control systems engineering methods. Psychology of Sport and Exercise , 41:172–180, 2019.
2[2] Joe Harris. Algebraic geometry: a first course , volume 133. Springer Science & Business Media, 2013.
3[3] Sarah Hojjatinia, Constantino M. Lagoa, and Fabrizio Dabbene. A method for identification of markovian jump arx processes. IFAC-Papers On Line , 50(1):14088 – 14093, 2017. 20th IFAC World Congress.
4[4] Sarah Hojjatinia, Constantino M Lagoa, and Fabrizio Dabbene. Identification of switched arx systems from large noisy data sets. ar Xiv preprint ar Xiv:1804.07411 , 2018.
5[5] Aleksandar Lj Juloski, Siep Weiland, and WPMH Heemels. A bayesian approach to identification of hybrid systems. IEEE Transactions on Automatic Control , 50(10):1520–1533, 2005.
6[6] Constantino M. Lagoa, David E. Conroy, Sarah Hojjatinia, and Chih-Hsiang Yang. Modeling subject response to interventions aimed at increasing physical activity: A control systems approach. Poster Presented at 5th International Conference on Ambulatory Monitoring of Physical Activity and Movement , 2017.
7[7] Hayato Nakada, Kiyotsugu Takaba, and Tohru Katayama. Identification of piecewise affine systems based on statistical clustering technique. Automatica , 41(5):905–913, 2005.
8[8] Necmiye Ozay, Constantino Lagoa, and Mario Sznaier. Set membership identification of switched linear systems with known number of subsystems. Automatica , 51:180 – 191, 2015.