A Robust Time Series Model with Outliers and Missing Entries
Triet M. Le

TL;DR
This paper introduces a robust time series modeling approach that effectively handles outliers and missing data by using sparsity constraints and uncertainty modeling, validated through simulations.
Contribution
It proposes a novel robust modeling framework for univariate time series with outliers and missing entries, incorporating sparsity and uncertainty constraints.
Findings
Validated with simulated results showing robustness
Effectively handles outliers and missing data
Reduces active coefficients via sparsity constraints
Abstract
This paper studies the problem of robustly learning the correlation function for a univariate time series with the presence of noise, outliers and missing entries. The outliers or anomalies considered here are sparse and rare events that deviate from normality which is depicted by a correlation function and an uncertainty condition. This general formulation is applied to univariate time series of event counts (or non-negative time series) where the correlation is a log-linear function with the uncertainty condition following the Poisson distribution. Approximations to the sparsity constraint, such as , are used to obtain robustness in the presence of outliers. The constraint is also applied to the correlation function to reduce the number of active coefficients. This task also helps bypassing the model selection procedure. Simulated results are presented to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Forecasting Techniques and Applications
A Robust Time Series Model with Outliers and Missing Entries
Triet M. Le NGA Research, The National Geospatial-Intelligence Agency, Springfield, VA. Email: [email protected].
Abstract
This paper studies the problem of robustly learning the correlation function for a univariate time series with the presence of noise, outliers and missing entries. The outliers or anomalies considered here are sparse and rare events that deviate from normality which is depicted by a correlation function and an uncertainty condition. This general formulation is applied to univariate time series of event counts (or non-negative time series) where the correlation is a log-linear function with the uncertainty condition following the Poisson distribution. Approximations to the sparsity constraint, such as , are used to obtain robustness in the presence of outliers. The constraint is also applied to the correlation function to reduce the number of active coefficients. This task also helps bypassing the model selection procedure. Simulated results are presented to validate the model.
1 Introduction
Forecasting (anticipating) future events or activities is an important problem in data science. A common task for a forecaster is to predict normal future events using past and current observations and to alert when the observed number of events significantly deviates from the predicted value. If events and activities are random then there is no hope in making any meaningful future prediction. However, if events are correlated in the sense that events at time depend on events prior to and the correlation function that describes this dependency persists throughout the observed data, then this correlation function can be used to predict future events based on past and current observations. In many real datasets, the observed data is incomplete and is often contaminated with outliers and random noise. An important task is then to robustly learn the correlation function that describes the underlying normal activities and patterns from the observed data. We start with the following setup.
Let be a uniform discretization of some time interval of interest which we assume to be . Let be the number of observed events and be the expected number of events that occur in the time interval . Observed events are determined by the conditional probability
[TABLE]
where is some auxiliary variable, representing for instance the variance, that may depend on. In a univariate case, the goal is to learn how depends on prior and , for . In other words, one is interested in finding the function such that
[TABLE]
In this paper, we consider the case where only a partial series for some is observed and it may contain outliers and anomaly. To tackle this problem, the following minimization problem is proposed
[TABLE]
for some , and , to jointly learn and the complete series via imputation.
- •
Here, we consider parametrically the correlation function defined as
[TABLE]
where is given by
[TABLE]
for some .
- •
measures the sparsity of the sequence representing anomaly. Similarly, and impose sparsity on and and overcome the model selection issue (e.g. AIC, BIC, etc.). Both of these constraints are important for the recovery of .
With the presence of missing entries and outliers, Figure 1 shows the ability of the model (3) to recover having
[TABLE]
In the followings, we provide motivations for the proposed model. Many well-known time series models can be described by (1) and (2). For instance, in an model [5, 18], each is defined as
[TABLE]
where follows , a normal Gaussian distribution with mean [math] and variance . Let
[TABLE]
then equation (5) implies . This implies
[TABLE]
Substituting for in (6), one obtains
[TABLE]
for some real-valued and and some new positive integers and . Here we assume zero boundary conditions, that is and are identically zero whenever . Clearly, other types of boundary conditions such as reflection or Neumann can be used. Thus (5) can be transformed into (7) and (8), which are (1) and (2) respectively.
Another example is the Poisson linear autoregressive model (see [20, 25], among others). Assuming is a non-negative integer, then (1) is given by
[TABLE]
and (2) is given by
[TABLE]
where and are non-negative. This shows that the model can only detect zero or positive correlations. To overcome this drawback, the log-linear model is often used [40, 20, 25, 15] where is defined such that
[TABLE]
Here , and are real-valued and therefore can represent negative correlations.
In the case of complete observations where we are given the series and some prior knowledge on the conditional probability condition (1), the task is then to learn the optimal that maximizes the likelihood function,
[TABLE]
It can be shown (see for instance [27]) that the joint probability in (10) is given by
[TABLE]
assuming and are independent. The maximization problem in (10) is equivalent to
[TABLE]
With some knowledge about , the conditional probability provides the distribution of . For a single value prediction , we can solve
[TABLE]
In case where the conditional probability follows a Poisson distribution, then is completely determined by , and it doesn’t depend on the auxiliary variable .
In this paper, we consider the problem of learning parametrically the underlying correlation function and given a partially observed series for some which may be contaminated by outliers and anomalies. Given some prior knowledge about the uncertainty condition (1), the problem consists of: 1) extending to the whole series (including ) via imputation such that for all and 2) using the complete series to learn . These two steps are done iteratively as they are inter-dependent. The interpretation of is as follows:
Suppose is normal, i.e. it can be described by and the uncertainty condition (1), then we would like to enforce . 2. 2.
On the other hand if is anomalous, then we allow and let the model decides a normal value for .
Moreover, outliers and anomalies are interpreted as rare events that are supported sparsely on . Based on this interpretation, we would like the difference series to be sparse. Thus this can be seen as solving the optimization problem:
[TABLE]
with the constraint that the partial series is sparse. Here, and are priors on and respectively. Sparsity is an essential ingredient in the theory of compressed sensing [7, 8, 13]. Approximating the sparsity constraint has been a subject of great importance as the exact sparsity problem is NP-hard. Here we consider sparsity approximations as proposed in [7, 8, 13, 10] by using for . Incorporating these sparsity approximations into the minimizing energy (14), we propose the following unconstraint variational problem
[TABLE]
Remark 1**.**
Predicting with the mean is optimal. However, for and , imputing with the mean is not always optimal. Indeed, suppose only depends on previous ’s, i.e. for some . Suppose also that both and are known with no outliers, that is for all and is the only non-observed entry. The task is to compute the optimal by using (15). This amounts to solving
[TABLE]
If , then it is clear that only the last term in the above sum contains . This implies
[TABLE]
On the other hand, if , then
[TABLE]
The latter case shows that is not always an optimal value for .
The paper is organized as follows. Section 2 recalls some related prior work that are most relevant. Section 3 describes the proposed Poisson log-linear model having the uncertainty condition (1) following the Poisson distribution and the correlation function (2) following a log-linear function. One subproblem for solving (15) is to compute
[TABLE]
for some fixed , and . In [32], Nie-etal proposed a method for solving this problem via solving a zero of a strictly convex function. For completeness, we go over in section 4 a similar method for computing the proximal operator for . Section 5 goes over an algorithm to compute a minimizer for (15). Section 6 shows numerical results on simulated data to validate the proposed model. In Appendix A we show that the above minimization problem (15) is related to
[TABLE]
for some fixed and .
2 Prior Work
In [23], Huber considered the classical least square problem of learning parameters from observations obeying the relation
[TABLE]
Here are known coefficients and are iid random Gaussian noise. For an autoregressive model, . Estimating the parameter amounts to minimizing the sum of squares
[TABLE]
Let , and , then the relative condition number measuring the sensitivity of with respect to perturbation of is [38]
[TABLE]
where is an arbitrary matrix norm and is the pseudo inverse of if exists. Let and be the largest and smallest singular values of respectively, then by using , one has . If is large, a small deviation in can create large deviation in the solution and hence it is important to obtain an accurate estimate for . As noted in [23], outliers affect the accuracy of the estimates. Following Lecture 18 in [38], the condition number for measuring the sensitivity of with respect to perturbation of is , where and . The more noise and outliers in the system, the closer is to . This leads to a large value of . Hence it is important to have good estimations of both and in the presence of missing data and outliers in the observation.
In [22, 23], Huber proposed a robust alternative to (17) by considering
[TABLE]
where is chosen so that it is less sensitive to large . In particular, the proposed has the form
[TABLE]
where is some chosen constant which is data dependent. From the definition of we see that if , least square is performed in (18) and hence the model respects the additive normal Gaussian noise assumption in (16). When , is no longer considered normal Gaussian but assumed to follow a Laplacian distribution of the form
[TABLE]
where is a constant such that . Note that the Laplacian distribution has a wider tail than a Gaussian distribution and hence allows for the existence of large better than the Gaussian distribution. Another popular robust choice for is the least absolute deviation [2] which amounts to having , which also follows a Laplacian distribution.
The standard LASSO [37] amounts to learning a sparse parameter vector via minimizing
[TABLE]
This model is still sensitive to outliers. A modification to this model introduces an extra variable representing outliers (see [31] and references there in):
[TABLE]
Suppose . In [29], a robust nonparametric model is proposed:
[TABLE]
where is a reproducing kernel Hilbert space (RKHS) which includes Sobolev spaces. A slightly different approach is proposed in [17], where each is given by , and both and are learned via solving
[TABLE]
In connection with the proposed model (15), take and let with and
[TABLE]
for some fixed . Note that is completely determined by and estimating the parameter vector with the LASSO prior amounts to minimizing
[TABLE]
Here, the sparse vector representing outliers is . Note that the proposed method (15) is much more general to accommodate other types of noise in the data that is not additive (multiplicative Gaussian, Poisson, negative binomial, etc.)
There are quite a few existing methods on estimating the parameters in the presence of missing data, and from a high level perspective, they align with the following two approaches.
The first approach iteratively imputes missing and unobserved data in some manner and then use the imputed and observed data to estimate the parameters. These methods include mean imputation, expectation maximization (EM) [12], multiple imputation [35], among others. See [30, 33, 21] for a survey of some of these methods. Matrix completion [6, 9] is a form of imputation where missing entries in the data matrix are imputed under the assumption that the data matrix has low rank. The proposed imputation performed in Algorithm 4 for solving (15) follows along the line the iterative approach of the EM method [12]; but instead of maximizing the expectation we maximize the likelihood.
The second approach either doesn’t impute or only imputes the necessary missing entries that the observed entries depend on. For instance in full information maximization likelihood (FIML) method [14], only complete data points are used as inputs to estimate the parameters. Suppose we are only interested in estimating the constant with , then all observed data points are complete. However, for and suppose one in every consecutive points are missing then the set of complete data points is empty and hence the FIML method is not applicable. The non-negative definite covariance method [26, 11] only considers observed data points as inputs as opposed to complete data points. Here the observed data points may depend on the missing data, however this method sets this dependency to zero, that is imposing zero boundary conditions on the unobserved entries for which some of the observed entries may depend on (see section 3.2.1 in [11].) Further modification was introduced to acquire non-negative definite condition for the covariance matrix. This second approach can also be applied to our problem and it is more appropriate when the missing entries are systematic as oppose to random. To motivate the problem, consider the case where the correlation function is a constant, that is for all . Assuming, that the observed series has no outliers, then can be approximated as the mean and there is no need to impute for all . Similarly, suppose and , then it is only necessary to impute as opposed to for all . One can view as an unknown boundary condition. Thus in this second approach, the minimization problem becomes
[TABLE]
where consists of and all of the indices such that depends on for some , and consists of all the indices such that depends on for some . Note that the difference between (15) and (20) is that in (20) the sum is only over observed indices as opposed to all indices (observed and unobserved.) It would be interesting to compare this approach with the proposed method (15) and we leave this for a future work.
3 Poisson Log-Linear Model
In this section, we consider the Poisson distribution for the conditional uncertainty condition and using a log-linear correlation function to model in (15). In particular, we suppose that is only conditioned on and that it follows the Poisson distribution,
[TABLE]
where whenever is a nonnegative integer. Note that (21) is defined for all . We consider to be a log-linear correlation function satisfying
[TABLE]
for some with the constraint . In other words,
[TABLE]
where is defined as in (22). Since is completely determined by , and , we see that the series is completely determined by and the series .
The prior on is now given by
[TABLE]
Here we assume all the parameters are independent from each other, and follow a family of exponential probability distributions
[TABLE]
where is chosen such that . corresponds to the LASSO constraint [37] and corresponds to the Bayesian bridge constraint proposed in [16, 34].
Combining (21)-(23) into (15), the proposed variational problem becomes
[TABLE]
for some , and .
Remark 2**.**
It is possible to minimize the energy in (24) over the set of nonnegative integer-valued series . However, this set is not convex. To overcome this non-convexity we extend (21) to all nonnegative real-valued series and therefore use as oppose to . We remark that this extension is not the continuous version of the Poisson distribution proposed in [28, 24]. Given , the cumulative probability distribution for the continuous Poisson is defined as if and
[TABLE]
Here
[TABLE]
is the incomplete Gamma function. Let the probability distribution function be defined such that
[TABLE]
It can be shown that for all ,
[TABLE]
Thus instead of (21), one can use or
[TABLE]
where is the largest integer that is less than or equal to .
4 Proximal Mapping for ,
One of the ingredients for computing (24) is to solve a subproblem
[TABLE]
for some , and . For , it was shown in [37] that is given by
[TABLE]
where . For , a common approach for solving (25) is to consider a regularized version via
[TABLE]
for some . The minimizer for is approximated by where
[TABLE]
with some initial guess . Since is nonconvex, may not be a global minimizer. In remark 3 below we show that for a range of , with the initial guess , the above iteration will converge to that is not a global minimizer.
There are explicit formula for when or [41], but for general , no closed-form expression for exists. In [32], Nie-Etal proposed a method for solving via computing a zero of a strictly convex function using Newton method. For completeness, we present here a method similar to the one proposed in [32].
Recall (25) and for simplicity assume and denote by the global minimizer for . First, note that . Now, for we get
[TABLE]
Define
[TABLE]
Note also that for , which shows that if and only if . Suppose . This implies that must satisfy
[TABLE]
On , we get
[TABLE]
and
[TABLE]
which is strictly greater than [math]. This implies is strictly convex on and achieving its minimal value at . Moreover, we have
[TABLE]
[TABLE]
This implies the followings:
If , that is
[TABLE]
then has no zeros on . This shows that a global minimizer does not exist. 2. 2.
If , that is
[TABLE]
then has exactly two zeros and . Since and , the function is strictly increasing near zero. This implies that the zero is not a local minimizer for , and hence . 3. 3.
If , that is
[TABLE]
then has exactly one zero at . Since is the first zero of and is strictly increasing near [math], we get that a global minimizer does not exist.
Remark 3**.**
Cases (28) and (30) correspond to having strictly increasing on , and hence is the global minimizer. Note in the case (30), is a saddle point of . As for the case (29), has one positive local minimizer at and therefore the global minimizer . It is possible that for some . In this case there is no uniqueness to the global minimizer of .
Figure 2 shows the plots of and the corresponding with and for various choices of . The green lines correspond to the value in and [math] in . Let . In the cases where , the global minimizer for is [math]. However, for , the second zero of the corresponding is the global minimizer for .
From the above remark, the shrinkage operator in (25) for some is given by
[TABLE]
Computing for the second zero of (assuming ) is fast and straightforward since is strictly convex. For instance, one can use the Newton method as follow:
Algorithm 1**.**
Newton method for computing the second zero of (assuming and ):
Set , and small. 2. 2.
while .
- •
. 3. 3.
end while.
In our numerical simulation, the above algorithm converges in Newton iterations with -.
For general , we have
[TABLE]
where is given in (31).
Remark 4**.**
can also be defined for . In this case, we get and for . This implies for , whenever . Thus we obtain the following shrinkage operator (hard thresholding)
[TABLE]
Note that if then there is no uniqueness of minimizer. Here we choose to be the minimizer but choosing [math] is also appropriate.
5 Numerical Implementation
There are numerous numerical methods that can be used to solve a minimizer for (24) (see [19] and references there in.) We mention in particular the FISTA algorithm [3] which provides a global rate of convergence when the minimizing energy is convex. The functional in (24) is related blind-deconvolution which is jointly nonconvex even in the case when . When both and are rationals in , numerical schemes such as PALM [4] or Block Prox-Linear Method [39] provides global convergence. Numerical schemes FISTA and PALM are described in algorithms 2-3. Algorithm 4 is a combination of the two where we apply the time-step updating criteria from FISTA to the proximal alternating scheme in PALM. We will make comparisons between PALM and Algorithm 4 via numerical simulations.
We rewrite the energy from (24) as
[TABLE]
where
[TABLE]
Let
[TABLE]
where
[TABLE]
The proximal mappings for ’s are defined as:
[TABLE]
where is a vector in such that . In the followings, denote by and the projection of on to and its complement , respectively.
Algorithm 2**.**
- •
Initialize: , , , , , small enough and (tolerance.)
- •
Do
. 2. 2.
. 3. 3.
. 4. 4.
. 5. 5.
. 6. 6.
. 7. 7.
. 8. 8.
. 9. 9.
. 10. 10.
.
- •
while .
- •
Set .
Algorithm 3**.**
- •
Initialize: , , , , small enough and (tolerance.)
- •
Do
. 2. 2.
. 3. 3.
. 4. 4.
. 5. 5.
.
- •
while .
- •
Set .
Algorithm 4**.**
Computing an optimal for (24).
- •
Initialize: , , , , , small enough and (tolerance.)
- •
Do
. 2. 2.
. 3. 3.
. 4. 4.
. 5. 5.
. 6. 6.
. 7. 7.
. 8. 8.
. 9. 9.
. 10. 10.
.
- •
while .
- •
Set .
Recall,
[TABLE]
The differentials of with respect to its variables are as follows.
[TABLE]
where
[TABLE]
Therefore,
[TABLE]
We have
[TABLE]
where
[TABLE]
Similarly,
[TABLE]
where
[TABLE]
Lastly,
[TABLE]
where
[TABLE]
6 Numerical results
In this section, we validate the model (24) with simulated data. We argue that the regularization on the parameters with , , and the sparsity constraint using , , are all crucial and necessary in estimating the parameters accurately. Throughout this section, we simulate data according to conditions (21) and (22) iteratively using , and the true parameters
[TABLE]
with the size of the series . In all the figures below, and are the simulated observed and true mean series using these parameters. We then apply partial series of to the model described in (24) to reconstruct the extended series , its mean and the parameters , . Also plotted in these figures is the vector with the following interpretation: implies is observed and implies is unobserved.
Example 1**.**
Figure 3 shows a comparison of performance between Algorithms 3 and 4 for 100 simulations. For each simulation, only of entries are observed and among these entries are contaminated. In both algorithms, the parameters used are and . For Algorithm 3 we use -, and for Algorithm 4 we use -. Even with a smaller , Algorithm 4 converges in 800 iterations on average, where as it takes on average 10,000 iterations for Algorithm 3 to converge. From the plots we observe that both algorithms provide similar statistics on the estimated parameters.
Example 2**.**
In this example, using Algorithm 4 we compare the results of estimating the parameters and with and without using regularization on . Figures 4-6 show the plots of the estimated and for three different amounts of observations (100%, 75% and 50%). In all these cases we use and . The values of changes depending on the amount of missing data. As the amount of missing data increases, the parameter estimation performance deteriorates when no regularization on is used (i.e. ). However, with regularization on , the estimated parameters are much closer to the true values.
Example 3**.**
In this example, using Algorithm 4 we show the significance of having the sparsity constraint in the model (24) to desensitize anomalies and outliers for obtaining a more accurate parameter estimation. Here we assume all data are observed, that is . For each simulation, we randomly select a certain percentage (1%, 5% or 10%) of the data and replace them with some anomalous value (here we pick as an example). See Figure 7 for an example showing the original series and a contaminated version. To test the significance of the sparsity constraint term, we consider two scenarios:
Assuming the observed data has no outliers and hence enforcing , . This amounts to picking to be large, say . 2. 2.
Assuming the observed data has outliers and hence allowing for , , whenever is an anomaly. This amounts to picking to be small, say .
In all cases the remaining parameters are: and . Figures 8-10 show the plots of the estimated parameters and for simulations with the amount of contaminations to be , and . We observe that enforcing (that is ) greatly alters the reconstruction of and and this error increases as the amount of contamination increases. This enforcement causes an increase in the mean value (see Figures 11-13) and a decrease in the absolute values of the correlation coefficients in the reconstructed time series (see Figures 8-10.) Letting , which is small, prevents the model from fitting to the anomalous . As a result, the estimated parameters and are much closer to the ground truth. The inaccuracy in parameter estimation effects the reconstruction of the mean series, and hence the prediction. The results are visibly seen in Figures 11-13.
Example 4**.**
Figures 14-15 show numerical results of 100 simulated data that have both missing entries and contamination among the observed ones using Algorithm 4. In Figure 14, of the entries are observed and of these entries are contaminated with outliers. The parameters used are and . In Figure 15, of the entries are observed and of these entries are contaminated with outliers. The parameters used are and .
Example 5**.**
In the proposed model (24), and are given a prior, and in all previous examples we assume and are known. However in real applications, these values need to be estimated. Model selection, that is picking the right choice for and , is a difficult problem to tackle. Current approaches involve running the algorithm on various choices of and and then use criterion such as AIC [1], BIC [36], etc. to pick the optimal values. We argue that the constraint on the parameters and removes the model selection task from the problem and lets the model discover an optimal sparse solution for and . In this example, we choose and . Figure 16 shows the box plots of the estimated parameters and for 100 simulations. For each simulation, only of entries are observed and of the observed entries are contaminated. The parameters used are: and .
Discussion: In this paper we present an autoregressive time series model to robustly learn the parameters (the mean and correlation coefficients) in the presence of noise, outliers and missing entries. In the presence of outliers or anomalies, we show that the nonconvex sparsity constraint desensitizes outliers and as a result the model provides a more robust estimation of the parameters. In the presence of missing entries we show that the constraint , , on the parameters significantly improves the accuracy of the estimated parameters. Model selection, that is picking the right choice for and , is a difficult problem to tackle in time series analysis. Current approaches involve estimating the parameters for various choices of and and then use criterion such as AIC, BIC, etc. to pick the optimal and . The constraint on the parameters and removes the model selection task from the problem and lets the model select an optimal sparse solution for and . However, in return one needs to provide the parameter and as inputs. We also mention that the proposed model (15) can be applied to other types of noise besides additive Gaussian or Poisson. Moreover, this model can also be extended to a multivariate case.
Appendix A Appendix
We remark that this is a standard technique for deriving the likelihood function, see for instance [27]. For completeness, we show the likelihood function for our problem here in the following proposition.
Proposition 1**.**
Given the observed series , the minimization problem (15) is equivalent to
[TABLE]
for some fixed and .
Proof.
Assume for all , , where is an additive random noise with
[TABLE]
This implies
[TABLE]
Using Bayes’ law, the joint probability is given by
[TABLE]
Since is completely determined by and (given), we have
[TABLE]
This implies
[TABLE]
Recursively apply the same technique to , we get
[TABLE]
As for , we first note that is completely determined by , and . This implies
[TABLE]
Since is completely determined by and , we get
[TABLE]
This implies
[TABLE]
Recursively apply the same technique to , we get
[TABLE]
Combining (39) and (40), we have
[TABLE]
where is the joint prior on and .
The maximization problem (38) is equivalent to
[TABLE]
where
[TABLE]
Treating and as fixed constants, we see that (42) is equivalent to
[TABLE]
which is (15) if we assume and are independent. ∎
Remark 5**.**
Using the same techniques as above, we see that the (-)log-likelihood function corresponding to the energy 24 is given by
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Hirotugu Akaike. A new look at the statistical model identification. Automatic Control, IEEE Transactions on , 19(6):716–723, 1974.
- 2[2] Gilbert Bassett Jr and Roger Koenker. Asymptotic theory of least absolute error regression. Journal of American Statistical Association , 73(363):618–622, 1978.
- 3[3] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences , 2(1):183–202, 2009.
- 4[4] Jerome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming , 146(1-2):459–494, 2014.
- 5[5] George EP Box and Gwilym Jenkins. Time series analysis: Forecasting and control. Holden-D. iv, San Francisco, 1970 .
- 6[6] Ammanuel J Candes and Benjamin Retch. Exact matrix completion via convex optimization. Foundations of computational mathematics , 9(6):717–772, 2009.
- 7[7] Emmanuel J Candes, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on , 52(2):489–509, 2006.
- 8[8] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on pure and applied mathematics , 59(8):1207–1223, 2006.
