Identification of Markov Jump Autoregressive Processes from Large Noisy Data Sets
Sarah Hojjatinia, Constantino M. Lagoa

TL;DR
This paper presents a new method for identifying Markov jump autoregressive models from large, noisy datasets, accurately estimating system dynamics, switching behavior, and noise parameters even with high noise levels.
Contribution
It introduces a novel identification approach that handles large measurement noise and Markov switching, improving accuracy in complex noisy environments.
Findings
Effective even with high noise-to-output ratios
Performs well with large datasets
Accurately estimates switching dynamics and noise parameters
Abstract
This paper introduces a novel methodology for the identification of switching dynamics for switched autoregressive linear models. Switching behavior is assumed to follow a Markov model. The system's outputs are contaminated by possibly large values of measurement noise. Although the procedure provided can handle other noise distributions, for simplicity, it is assumed that the distribution is Normal with unknown variance. Given noisy input-output data, we aim at identifying switched system coefficients, parameters of the noise distribution, dynamics of switching and probability transition matrix of Markovian model. System dynamics are estimated using previous results which exploit algebraic constraints that system trajectories have to satisfy. Switching dynamics are computed with solving a maximum likelihood estimation problem. The efficiency of proposed approach is shown with several…
Click any figure to enlarge with its caption.
Figure 1| experiment | True | Estimated | Normalized | ||
|---|---|---|---|---|---|
| # | PTM | PTM | Ferobenius norm | ||
| 1 | 0.035810 | 0.0888 | 0.01 | ||
| 2 | 0.066922 | 0.1439 | 0.03 | ||
| 3 | 0.090075 | 0.1967 | 0.05 | ||
| 4 | 0.064687 | 0.2273 | 0.07 | ||
| 5 | 0.082236 | 0.2650 | 0.09 | ||
| 6 | 0.115335 | 0.5223 | 0.27 | ||
| 7 | 0.096751 | 0.4603 | 0.29 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Identification of Markov Jump Autoregressive Processes
from Large Noisy Data Sets
Sarah Hojjatinia1, Constantino M. Lagoa2 1Sarah Hojjatinia is with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, USA, [email protected]2Constantino M. Lagoa is with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, USA, [email protected]
This work was partially supported by National Institutes of Health (NIH) Grant R01 HL142732, and National Science Foundation (NSF) Grant #1808266.
Abstract
This paper introduces a novel methodology for the identification of switching dynamics for switched autoregressive linear models. Switching behavior is assumed to follow a Markov model. The system’s outputs are contaminated by possibly large values of measurement noise. Although the procedure provided can handle other noise distributions, for simplicity, it is assumed that the distribution is Normal with unknown variance. Given noisy input-output data, we aim at identifying switched system coefficients, parameters of the noise distribution, dynamics of switching and probability transition matrix of Markovian model. System dynamics are estimated using previous results which exploit algebraic constraints that system trajectories have to satisfy. Switching dynamics are computed with solving a maximum likelihood estimation problem. The efficiency of proposed approach is shown with several academic examples. Although the noise to output ratio can be high, the method is shown to be extremely effective in the situations where a large number of measurements is available.
1 Introduction
While identification of linear time invariant systems is by now a well understood problem, identification of switched and hybrid systems is considerably less developed, even in the piecewise affine case. Existing methods exploit a number of algebraic, optimization-based techniques to find subsystem dynamics and switching surfaces [10]. A common feature is the computational complexity entailed in dealing with noisy measurements: in this case algebraic procedures lead to nonconvex optimization problems, while optimization methods lead to mixed integer/linear programming [11].
Similarly, methods relying on probabilistic priors [5] lead to combinatorial problems. This can be avoided by using clustering-based methods [7]. However, these require “fair sampling” of each cluster, which constrains the data that can be used. In [9, 8], some sparsification based-techniques for identification of affine switched models have been developed that allow for several types of noise.
This paper develops effective methods for identifying switching dynamics from large noisy data sets, for a broad class of systems described by switching autoregressive models. These systems can be considered a generalization of piecewise affine models, and breach the gap between linear and nonlinear models, retaining many of the tractability properties of the former, while providing descriptions that more accurately capture the features of practical problems over broader scenarios.
Development of the proposed framework is motivated by health care applications: Specifically, smartphone-based interventions for increasing light physical activity [6]. More precisely, the availability of activity tracking devices allows gathering of large amount of data such as physical activity of an individual. Physical activity is a dynamic behavior, so it can be modeled as a dynamical system. Furthermore, its characteristics may remarkably change based on the time in a day, weekdays or weekends, location, etc; which, motivated the approach of modeling it as a switching system [1].
In identifying the parameters of switched models, the dynamics of switching play an important role. The interest in Markovian jump systems, switched system with switching dynamics based on a Markov chain, has been growing since they have a broad range of application in different areas and real world problems such as economic systems, power systems, and networked control systems. In comparison to the large amount of literature on analysis and control of Markovian jump systems, the identification problem seems to have received very little attention.
In [3] a new method for the identification of parameters of Markovian jump system is provided. The probability transition matrix is estimated using a suitable convex optimization problem. However, due to computational complexity, the number of measurement that the proposed approach is able to handle is limited and only process noise was considered. In this paper, we focus on cases involving a very large number of measurements, possibly affected by large values of noise. In this case, polynomial/moments based approaches become ineffective, and different methodologies need to be devised. The approach we propose builds upon the same premises as [4].
More precisely we start by assuming that the output measurements are corrupted by random Normal measurement noise with unknown variance. Then, we exploit the availability of a large number of measurements and the results in [4] to determine high confidence estimates of the systems parameters and the variance of the measurement noise. Finally, by using a maximum likelihood approach, we estimate the probability transition matrix and dynamics of switching. The approach can be easily extended to other noise distributions, as long as the number of unknown parameters of the distribution is ”low.”
1.1 Paper Organization
The paper is structured as follows: after this introduction, problem statement is defined in Section 2. Identification of system coefficients and noise parameters are reviewed in Section 3. In Section 4, the method for identification of probability transition matrix is described. Numerical results are shown in Section 5. Finally, Section 6 concludes the paper highlighting some possible future research directions.
2 Problem Statement
A precise description of the problem addressed is provided in this section. Assumptions needed to solve the problem are also introduced.
2.1 System Model
We consider switched autoregressive (SAR) linear models of the form
[TABLE]
where is the output at time and is input at time . The variable denotes the sub-system active at time , where is the total number of sub-systems. Furthermore, and denote unknown coefficients corresponding to mode . Time takes values over the non-negative integers. The latent discrete state evolves according to a Markov chain with transition probability matrix , whose entry is
[TABLE]
Output is assumed to be contaminated by (possibly large) noise; i.e. observations are of the form:
[TABLE]
where , denotes measurement noise.
The following assumptions are made on the system model and noise.
Assumption 1
Throughout this paper it is assumed that:
- •
Upper bounds on and are available.
- •
Upper bound on the number of sub-systems is available.
- •
Measurement noise has zero mean Normal distribution with unknown variance.
- •
Noise is independent from for , and identically distributed.
- •
Input sequence applied to the system is known and bounded.
- •
There exists a finite constant so that for all positive integers .
- •
Switching sequence is based on a Markov process.
Note that, the approach in this paper can be extended for any noise distribution as long as the number of unknown distribution parameters is “small.”
2.2 Problem Definition
The main objective of this paper is to develop algorithms to identify the parameters of SAR systems, noise parameters, and dynamics of switching from noisy observations. More precisely, we aim at solving the following problem:
Problem 1
Given Assumption 1, an input sequence , and noisy output measurements , , determine
Coefficients of the SAR model , , , , , , 2. 2.
Noise distribution parameters, 3. 3.
Switching sequence , which is based on a Markov process.
3 Review: Identification of System Coefficients and Noise Parameters
To identify the coefficients of SAR system with measurement noise from large amount of data, we adopt the approach developed in [4]. For the sake of completeness, we briefly summarize the approach in this section. We refer the reader to [4] for more details.
First, we review earlier results on an algebraic reformulation of the SAR identification problem for the case where no noise is present. Details on the algebraic approach to switched system identification can be found in [12].
The equation (1) is equivalent to
[TABLE]
where
[TABLE]
is the known regressor at time , and
[TABLE]
is the vector of unknown coefficients at time . Hence, independently of which of the submodels is active at time , we have
[TABLE]
where the vector of parameters corresponding to the -th submodel is denoted by , and is Veronese map of degree [2]
[TABLE]
which contains all monomials of order in lexicographical order, and is a vector whose entries are polynomial functions of unknown parameters (see [13] for explicit definition). The Veronese map above is also known as polynomial embedding in machine learning [13].
Note that the number of rows of the Veronese matrix is equal to the number of measurements available for the regressor . Therefore, a reformulation of the previous results to address the problem of identification from very large data sets is as follows [4].
For the noiseless case, a reformulation of the hybrid decoupling constraint shows identifying the coefficients of the sub-models is equivalent to finding the singular vector associated with the minimum singular value of the matrix
[TABLE]
where, matrices are of size , and size does not depend on the number of measurements, which is especially important in the case of very large data sets.
Identifying the parameters of the SAR model is equivalent to finding a vector in the null space of the matrix . Under mild conditions, the null space of the matrix above has dimension one if and only if the data is compatible with the assumed model. However, when output is corrupted by noise, is not known and, therefore, this matrix cannot be computed. However, we can use available information on the statistics of the noise and the measurements collected to compute approximations of the matrix and, consequently, approximations of vectors in its null space.
We start by noting that although are unknown, the following holds
[TABLE]
where denotes expectation and is the moment of noise.
Hence, assuming that distribution of the noise, and the input signal are given and fixed, there exists an affine function so that
[TABLE]
where denote a function that returns a vector with all monomials up to order of its argument.
This can be exploited to identify the parameters of the SAR system. The only thing needed is an estimation of the matrix in (6). It turns out that this can be done using the available noisy measurements. More precisely, we can construct the matrix
[TABLE]
and it is shown in [4] that this matrix converges to in (6) as almost surely. Hence, for large number of measurements , the null space of the matrix can be used to determine the coefficients of the subsystems.
The above assumes knowledge of the moments of the noise. However, this does not need to be the case. For simplicity of exposition, assume that measurement noise has a Normal distribution with zero mean and unknown variance . Then, since the moments are known functions of the variance, is a known function of and estimation of variance can be performed by minimizing the minimum singular value of matrix above over the allowable values of . More precisely, the parameters of the submodels and the variance of the noise can be identified using the following algorithm: Let , , , some parameters of the noise and be given.
- Step 1.
Compute matrix as a function of the unknown noise parameter . 2. Step 2.
Find the value that minimizes the minimum singular value of . 3. Step 3.
Let be associated singular vector. 4. Step 4.
Determine the coefficients of the subsystems from the vector .
In order to perform Step 3 in Algorithm, we adopt polynomial differention algorithm for mixtures of hyperplanes, introduced by Vidal [14, pp. 69–70]. In practice for sufficiently large , the above algorithm provides both a good estimate of the systems coefficients and noise parameters, especially if we take to be the smallest value of for which the minimum singular value of is below a given threshold Previous work cannot address the identification of switching dynamics and estimating the probability transition matrix of Markov jump models, this problem is explicitly addressed in the following section.
4 Identification of Probability Transition Matrix
In Section 3, the algorithms and procedure of identifying noise parameters and system coefficients have been presented. In this section the switching behavior and dynamics of switching are considered. This is done in two steps: The first step is to identify switches that have the highest probability of occurrence. Then, in the second step, by considering these switches as a good estimate of switching sequence, we estimate the transition probabilities.
4.1 Maximum likelihood switch sequence
Assume that the noise variance and system coefficients have been identified. To do the first step in identification of switching dynamics, i.e., determine the switches with highest probability, we start by building the following sequence based on available data and identified coefficients and parameters. Considering equations (1) and (3):
[TABLE]
Since . we have
[TABLE]
Since we have identified the coefficients and , and input output (, ) are available, we are able to determine the realization of the random variable in the right hand side of equation (8) for the all possible values of the switching sequenc . Define
[TABLE]
To identify the most probable realization of the switches, we can use the values of for each possible active system at time (), and determine the sequence , of maximum likelihood. However, for a fixed switching sequence, is a sequence of correlated random variables. Even though the measurement noise is iid, depends on , leading to a correlated sequence of random variables. Therefore, determining the values of , that lead to the highest likelihood is a complex combinatorial problem.
To circumvent this, we start by noting that is independent of if . Therefore, if enough data is available, we can use only independent “snippets” of data of low enough length for which: i) maximum likelihood sequence can be easily computed and ii) given that they are independent, likelihood can be computed individually for each snippet.
Hence, in the identification procedure proposed in this paper, we only consider snippets of data of the length that are separated in time by at least sample periods. More precisely, we consider snippets of data of length , compute its joint distribution as a function of the switching sequence, determine the maximum likelihood switches for this snippet, skip the next data points, and repeat the process until we run out of data.
We now elaborate on this. Take snippets of data of length , denoted as a vector defined as:
[TABLE]
where int refers to integer part (round towards zero) of its argument. In this way, each snippet is independent from other snippets and each of these has a mutltivariable Normal distribution whose covariance matrix is a function of the switching sequence. As a reminder, an dimension multivariate Normal distribution has density function
[TABLE]
where is the covariance matrix of dimension .
Note that, at each time , there are possible switching sequences for the snippet , since \delta_{k}\in\big{\{}1,\cdots,\,n\big{\}}. Therefore, if is small enough, we can compute the likelihood value for each of the choices and take the most likely sequence of subsystems as the one leading to the highest likelihood. Hence, given the independence of , estimating the most likely switching sequence in the used snippets can be done by solving the following problem
[TABLE]
whose optimization can be done separately for each term of the sum. Therefore, its complexity is exponential in but linear in the number of snippets and can be efficiently solved if is “not too large.”
Remark 1
As previously mentioned, in the above formulation, we do not use all available data when computing high likelihood switchings. More precisely, we only use of the data. Hence, any choice of is a compromise between computational complexity and fraction of the data used and the “right” choice should be done by taking into account how many measurements are available.
Solving Problem (11), allows us to determine how many times a specific ”jump” occurs in this high likelihood sequence of switches. This can be done in the following way:
- Step 1.
Solve problem (11). Recall that this can be done by solving the problem separately for each . 2. Step 2.
For each , let be the number of times the transition from system to system occurs in the maximum likelihood switch sequence for snippet . 3. Step 3.
Compute the total number of transitions from system to system observed in all snippets
[TABLE]
Given this high likelihood estimate of how often a transition occurs in the snippets, we can estimate the probability transition matrix. This can be done by solving the “traditional” maximum likelihood problem for Markov chains
[TABLE]
or equivalently, solve the equivalent convex optimization problem
[TABLE]
As the number of observations goes to infinity, the solution of this problem converges to the true probability transition matrix. More precisely, we have the following result
Theorem 1
Assume that the Markov model for switching is aperiodic and let and be the estimated and true transition probabilities respectively. Then,
[TABLE]
where is the variance of the measurement noise.
Sketch of proof: From the results in [4], we know the estimates of parameters of the system converge to the true parameters as for large enough finite . Hence, under mild assumptions on the subsystems, we can identify the true transitions within snippets. The fact that the Markov chain is not periodic, implies that, in an infinite sequence, transitions will occur with equal probability in each snippet. This implies that, as relative frequency of the transitions converges to the true transition probabilities.
4.2 An Example
To better illustrate the approach, we provide an example of how to do the maximum likelihood estimation of probabilities required for identification of probability transition matrix. Therefore, consider the problem of identifying switching dynamics for SAR system with subsystems of the form
[TABLE]
and noisy measurements
[TABLE]
where has zero mean Normal distribution, and . We consider snippets of data of length , and skip sample measurement in between the snippets of data, i.e. two snippets of data for subsystem 1 are like:
[TABLE]
and we have skipped this one:
For this example
[TABLE]
and,
[TABLE]
Therefore, the set of possible active sequences can be:
[TABLE]
At each time , there are possible cases. For system shown in equation (15), cases are as follows:
[TABLE]
The multivariate Normal Probability at each of sequences will be computed, and the one which has the maximum value of the likelihood will be considered as the set of active subsystems at that point.
5 Numerical Results
In this section, we will address the problem of identifying switching dynamics in Markovian jump systems. The values of true coefficients in this example has taken from the example in [4], which are and . Measurement noise is assumed to have zero-mean Normal distribution. So, AR system with subsystems, , and in this example are as follows:
[TABLE]
Total number of input-output data is available. Output is corrupted with random measurement noise which is Normal with zero mean and different values of variance. The proposed algorithm is coded and run in Python.
Noise to output ratio () is defined as
[TABLE]
Simulation results for several experiments are shown in Table 1. For each experiment, a random probability transition matrix has been generated, which is shown in column 2 of the table. By using the algorithms mentioned in the paper, for each experiment probability transition matrix has been estimated from noisy measurements, which is shown in column 3 of the table. The normalized Frobenius norm between true and estimated values of probability transition matrix has been computed and shown in column 4 of the table . For each experiment noise to output ratio and variance of noise are shown in columns 5 and 6 of the table. As we see in this table, the value of entries in probability transition matrix are very close to the true values, even when the noise variance is high with noise magnitude in average around 30% of the signal magnitude.
For example, in experiment 6 the value of shows that noise to output ratio of approximately ; even with this very large value of corruption with measurement noise, the proposed method works well and the normalized Frobenius norm between true and estimated values of probability transition matrix is only 0.1153. As expected and in the Table 1 is shown, for smaller values of noise to output ratio (), the estimated values for probability transition matrix are closer to the true probability transition matrix and normalized Frobenius norm of their difference has smaller values. However, with the prposed approach even for large values of noise to output ratio, the difference in Frobenius norm is still low.
Figure 1 demonstrates the convergence of probability transition matrix as number of measurements grows. This figure is based on a random experiment, for the values of coefficients in equation (18), fixed variance of noise and noise to output ratio . As we see in this figure the values of normalized Frobenius norm between true and estimated probability transition matrix decreases, when number of measurement increases. As we observe in Figure 1, the value of difference between true and estimated probability transition matrix decreases from at to at . It shows even for the case of having noise to output ratio, the approximated switching dynamics and transition probability matrix are close to the true ones.
6 conclusion and future work
In this paper we have proposed a methodology for identification of switching dynamics in Markovian jump SAR models. Given large noisy input-output data, by using previously developed procedures for identification of switched system from large noisy data sets, we estimate the parameters of the noise, and then, identify the coefficients of each submodel. Then, by using the novel procedure presented in this paper for estimation of probability transition matrix, we identify the switching dynamics and computed the probability transition matrix of Markov chain. Even for large values of measurement noise, numerical simulations show a low estimation error. The Frobenius norm between estimated and true probability transition matrix is small even in the case of large noise to output ratio. For future work, we can consider the problem of identifying switched ARX models and switching dynamics form large noisy data sets, but with the process noise. We will also test the effectiveness of the proposed approaches in “real” applications with emphasis on estimating individual response to treatments aimed at improving light physical activity.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] David E Conroy, Sarah Hojjatinia, Constantino M Lagoa, Chih-Hsiang Yang, Stephanie T Lanza, and Joshua M Smyth. Personalized models of physical activity responses to text message micro-interventions: A proof-of-concept application of control systems engineering methods. Psychology of Sport and Exercise , 41:172–180, 2019.
- 2[2] Joe Harris. Algebraic geometry: a first course , volume 133. Springer Science & Business Media, 2013.
- 3[3] Sarah Hojjatinia, Constantino M. Lagoa, and Fabrizio Dabbene. A method for identification of markovian jump arx processes. IFAC-Papers On Line , 50(1):14088 – 14093, 2017. 20th IFAC World Congress.
- 4[4] Sarah Hojjatinia, Constantino M Lagoa, and Fabrizio Dabbene. Identification of switched arx systems from large noisy data sets. ar Xiv preprint ar Xiv:1804.07411 , 2018.
- 5[5] Aleksandar Lj Juloski, Siep Weiland, and WPMH Heemels. A bayesian approach to identification of hybrid systems. IEEE Transactions on Automatic Control , 50(10):1520–1533, 2005.
- 6[6] Constantino M. Lagoa, David E. Conroy, Sarah Hojjatinia, and Chih-Hsiang Yang. Modeling subject response to interventions aimed at increasing physical activity: A control systems approach. Poster Presented at 5th International Conference on Ambulatory Monitoring of Physical Activity and Movement , 2017.
- 7[7] Hayato Nakada, Kiyotsugu Takaba, and Tohru Katayama. Identification of piecewise affine systems based on statistical clustering technique. Automatica , 41(5):905–913, 2005.
- 8[8] Necmiye Ozay, Constantino Lagoa, and Mario Sznaier. Set membership identification of switched linear systems with known number of subsystems. Automatica , 51:180 – 191, 2015.
