Estimation of high-dimensional factor models and its application in power data analysis
Xin Shi, Robert Qiu

TL;DR
This paper introduces a novel spectral density-based method for estimating high-dimensional factor models in power data, effectively handling noise and complex residual structures using free probability theory.
Contribution
It proposes a new approach that estimates the number of factors and residual correlation structure without crude assumptions, leveraging spectral density and free probability theory.
Findings
Method is robust against noise.
Sensitive to weak factors.
Validated with IEEE 118-bus power system data.
Abstract
In dealing with high-dimensional data, factor models are often used for reducing dimensions and extracting relevant information. The spectrum of covariance matrices from power data exhibits two aspects: 1) bulk, which arises from random noise or fluctuations and 2) spikes, which represents factors caused by anomaly events. In this paper, we propose a new approach to the estimation of high-dimensional factor models, minimizing the distance between the empirical spectral density (ESD) of covariance matrices of the residuals of power data that are obtained by subtracting principal components and the limiting spectral density (LSD) from a multiplicative covariance structure model. The free probability theory (FPT) is used to derive the spectral density of the multiplicative covariance model, which efficiently solves the computational difficulties. The proposed approach connects the…
| Algorithm 1. Procedure of factor model estimation |
|---|
| Input: The observed data matrix . |
| Output: The estimated number of factors , and the ratio rate . |
| 1: For the number of removed factors 2: Obtain the real residual through Eq. (14). 3: Normalize into the standard form through Eq. (5). 4: Calculate the covariance matrix of the standardized residual through Eq. (15), i.e., . 5: For the ratio rate 6: Calculate according to the prescriptions in Section 3.3. 7: Calculate the spectral distance through Eq. (19) and save the result in each iteration. 8: End for 9: End for 10: Obtain the optimal parameter set through Eq. (18). |
| Sample sizes | {50,100,200,300,500} | |
|---|---|---|
| Number of factors | {2,3,4} | |
| 1/SNR | {1/10000,1/1000,1/100,1/10,1} | |
| Correlations in residuals | {(0,0,0),(0.5,0,0),(0,0.05,/10),(0.5,0.05,/10)} |
| 3.000 | 0.5851 | 3.000 | 0.7405 | 10.010 | 0.6395 | 2.948 | 0.7564 | ||
| 3.000 | 0.5910 | 2.998 | 0.7435 | 10.000 | 0.6366 | 3.000 | 0.7534 | ||
| 3.000 | 0.6019 | 3.010 | 0.7366 | 10.045 | 0.6494 | 3.061 | 0.7682 | ||
| 3.006 | 0.5930 | 3.007 | 0.7415 | 10.047 | 0.6831 | 2.924 | 0.7484 | ||
| 3.011 | 0.5999 | 3.033 | 0.7435 | 10.045 | 0.6702 | 3.199 | 0.7257 | ||
| 3.000 | 0.5772 | 3.099 | 0.7524 | 10.030 | 0.6399 | 3.017 | 0.7445 | ||
| 3.000 | 0.5801 | 3.031 | 0.7524 | 10.005 | 0.6380 | 3.274 | 0.7484 | ||
| 3.000 | 0.5811 | 2.900 | 0.7583 | 10.031 | 0.6330 | 3.101 | 0.7544 | ||
| 3.000 | 0.5801 | 3.000 | 0.7564 | 10.010 | 0.6399 | 3.382 | 0.7494 | ||
| 3.002 | 0.5891 | 3.045 | 0.7425 | 10.023 | 0.6380 | 3.300 | 0.7405 | ||
| 3.000 | 0.6366 | 3.000 | 0.7187 | 10.003 | 0.6633 | 3.000 | 0.7405 | ||
| 3.000 | 0.6247 | 2.998 | 0.7088 | 10.000 | 0.6534 | 2.996 | 0.7474 | ||
| 3.000 | 0.6286 | 3.002 | 0.7316 | 10.000 | 0.6435 | 3.132 | 0.7465 | ||
| 3.000 | 0.6207 | 2.999 | 0.7227 | 10.003 | 0.6593 | 2.946 | 0.7395 | ||
| 3.000 | 0.6336 | 3.000 | 0.7118 | 10.005 | 0.6583 | 3.161 | 0.7286 | ||
| 3.000 | 0.5841 | 3.000 | 0.7653 | 10.000 | 0.6310 | 3.000 | 0.7702 | ||
| 3.000 | 0.5712 | 3.000 | 0.7613 | 10.005 | 0.6310 | 3.099 | 0.7663 | ||
| 3.000 | 0.5782 | 2.998 | 0.7603 | 10.000 | 0.6390 | 3.099 | 0.7732 | ||
| 3.000 | 0.5792 | 3.010 | 0.7712 | 10.000 | 0.6320 | 3.000 | 0.7603 | ||
| 3.000 | 0.5722 | 3.004 | 0.7672 | 10.001 | 0.6300 | 3.099 | 0.7752 | ||
| Bus | Sampling Time | Active Load(MW) |
|---|---|---|
| 20 | ||
| 30 | ||
| 60 | ||
| Others | Unchanged |
| Bus | Sampling Time | Active Power(MW) |
|---|---|---|
| 20 | ||
| Others | Unchanged |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Statistical Methods and Inference · Bayesian Methods and Mixture Models
Estimation of high-dimensional factor models
and its application in power data analysis
Xin Shi, Robert Qiu X. Shi is with the Center for Big Data and Artificial Intelligence, Shanghai Jiao Tong University, Shanghai 200240, China.
E-mail: [email protected] R. Qiu is with the Center for Big Data and Artificial Intelligence, Shanghai Jiao Tong University, Shanghai 200240, China. E-mail: [email protected]
Abstract
In dealing with high-dimensional data, factor models are often used for reducing dimensions and extracting relevant information. The spectrum of covariance matrices from power data exhibits two aspects: 1) bulk, which arises from random noise or fluctuations and 2) spikes, which represents factors caused by anomaly events. In this paper, we propose a new approach to the estimation of high-dimensional factor models, minimizing the distance between the empirical spectral density (ESD) of covariance matrices of the residuals of power data that are obtained by subtracting principal components and the limiting spectral density (LSD) from a multiplicative covariance structure model. The free probability theory (FPT) is used to derive the spectral density of the multiplicative covariance model, which efficiently solves the computational difficulties. The proposed approach connects the estimation of the number of factors to the LSD of covariance matrices of the residuals, which provides estimators of the number of factors and the correlation structure information in the residuals. Considering a lot of measurement noise is contained in the power data and the correlation structure is complex for the residuals, the approach prefers approaching the ESD of covariance matrices of the residuals through a multiplicative covariance model, which avoids making crude assumptions or simplifications on the complex structure of the data. Theoretical studies show the proposed approach is robust against noise and sensitive to the presence of weak factors. The synthetic data from IEEE 118-bus power system is used to validate the effectiveness of the approach. Furthermore, the application to the analysis of the real-world online monitoring data in a power grid shows that the estimators in the approach can be used to indicate the system behavior.
Index Terms:
high-dimensional data, factor model estimation, principal components, multiplicative covariance structure, free probability theory, power data
1 Introduction
Factor models are important tools for reducing the dimensionality of the observed data and extracting the relevant information. They are used for modeling a large number of variables through a small number of unobserved variables to be estimated in many applications. With the emergence of big data in many fields, especially the increasing data dimensionality, extensive studies on the estimation of high-dimensional factor models have been conducted.
Bai and Ng [1] proposes using information criteria for estimating the number of factors, which is developed under the framework of high data dimensions (), seriously different from the previous methods [2, 3, 4, 5] developed under the assumption that the data dimension is fixed or small. A critical assumption made in the work is the factors’ cumulative effect on grows proportionally to . Stock and Watson [6] suggests using principal components for estimating factors in high-dimensional datasets. Kapetanios [7, 8] first proposes exploiting a structure of residual terms in the approximate factor models. Based on Kapetanios’s work, Onatski [9] relaxes the restrictions on the covariance structure of the residual terms and develops a new consistent estimator for estimating the number of factors. Harding [10] imposes restrictions on the spatial-temporal correlation patterns of the residual terms, and proposes an estimation method for the number of factors by relating the moments of the empirical spectral density (ESD) of covariance matrices of the observed data to the parameters regarding the spatial-temporal correlations. Yeo and Papanicolaou [11] presents a new approach to estimate the number of factors by connecting the factor model estimation problem to the limiting spectral density (LSD) of covariance matrices of the residuals, in which two strict assumptions are made: one is the spatial correlation of the real residuals can be completely eliminated by removing the estimated number of factors; the other is the residuals follow an AR(1) process.
1.1 Contributions and Paper Organization
Based on the previous work, in this paper, instead of modeling the structure of the residuals directly, we propose approaching the LSD of covariance matrices of the residuals through a multiplicative covariance structure model with an controllable parameter. It avoids making crude assumptions on the structure of the data residuals and allows the proposed approach being more flexible and practical in analyzing the real-world data. Take the power flow data for example, the classical physical model in matrix form is as follows,
[TABLE]
where denotes the variations of regarding variables and is the inverse of the Jacobian matrix. is the observed data (e.g., voltage amplitude and phase angle), are considered as the signals (e.g., active and reactive power), and represents small random fluctuations or measuring errors. Since a lot of measurement noise is contained in the residual term and the spatial-temporal correlations among its entries are complex, it is impossible to model the residuals from power data directly without any assumptions and simplifications.
Inspired by the idea of decomposing the observed data into systemic components (factors) and idiosyncratic components (residuals), we consider an approximate factor model for variables and observations as follows,
[TABLE]
where is an observed data matrix, is an ( is the number of factors) factor loading matrix, is an matrix of factors, and is an residual matrix.
One simple way to estimate is using the principal components and assuming as pure noise. However, our approach mainly focuses on and we estimate the number of factors and the ESD of covariance matrix of simultaneously. The main advantages of the proposed approach can be summarized as follows:
- •
It relaxes restrictions on the structure of the residuals . pure noise or just temporal-correlation assumption for the residuals is crude and unreasonable in practice. Instead of modeling with strict structure item, the proposed approach prefers approaching the ESD of covariance matrix of through a multiplicative covariance structure model with an controllable parameter, which makes the approach more flexible and practical.
- •
The proposed approach uses free probability techniques in RMT to derive the LSD of the built multiplicative covariance model, which greatly simplifies the calculation process and ensures the efficiency of the approach.
- •
It relates the estimation of the number of factors to the ESD of covariance matrix of , which allows controlling both the number of factors and the spectral shape of the residuals.
- •
The theoretical studies on the synthetic data generated from Monte Carlo experiments show the proposed approach is robust against noise and sensitive to the weak factors, and the built multiplicative covariance structure can fit the ESD of covariance matrices of the auto-cross(weak)-correlation structure residuals better than the AR(1) model in Yeo and Papanicolaou’s approach.
- •
By using the power data generated from IEEE 118-bus test system, the estimators in the proposed approach are proved to be sensitive in indicating the number and scale of anomaly events occurred in the power system.
- •
With the real-world online monitoring data from a power grid, the estimators in the proposed approach are found to be successful in indicating the system states.
The rest of this paper is organized as follows. In Section 2, we apply the Marchenko-Pastur law for the residuals from both synthetic data and real-world power data. In Section 3, we present our approach for the estimation of high-dimensional factor models. In Section 4, by using the synthetic data generated from Monte Carlo experiment, we evaluate the performance of our approach and compare it with that developed by Yeo and Papanicolaou in terms of detecting weak factors and convergence rate. Section 5 shows the applications of our approach to power data analysis. In Section 6, conclusions are presented.
2 Motivation Example
Marchenko-Pastur law (M-P law): Let be an random matrix, whose entries are independent identically distributed (i.i.d.) variables with the mean and the variance . The corresponding covariance matrix is defined as . As but , according to the M-P law [12], the ESD of converges to the limit with probability density function (PDF)
[TABLE]
where , .
In this section, we first apply the M-P law for the residuals from the synthetic data generated by the following model,
[TABLE]
where , , and are independent. The true number of factors is set to be 4. As is shown in Fig. 1, with the factors removed continuously, the ESD of covariance matrices of the residuals converges to the M-P law.
In contrast, we apply the M-P law for the residuals from the real-world online monitoring data in a power grid. Let matrix be the sampling data with , and is the residual matrix obtained by subtracting principal components from . We convert into the standard form through
[TABLE]
where , , and . As is shown in Fig. 2, no matter how many factors are removed, the ESD of covariance matrices of the residuals from the real-world data does not fit to the M-P law. Therefore, it is necessary to build a new model to fit the ESD from real residuals in estimating factor models.
3 FPT Based Factor Model Estimation
In this section, we propose an approach for the estimation of high-dimensional factor models. In Section 3.1, we provide preliminaries that will be used in the proposed approach. In Section 3.2, we introduce a new factor model estimation approach, which connects the estimation of the number of factors to the ESD of covariance matrices of the residuals. Considering a lot of measurement noise is contained in the residuals and the complex correlation structure of the residuals from power data, an approaching way is proposed for calculating the LSD of covariance matrices of the residuals. Specific steps of the proposed approach are given in Section 3.3, in which FPT is used for deriving the spectral density of the built multiplicative covariance structure model.
3.1 Preliminaries
Definition 1
For a random matrix , the empirical spectral density of is defined as,
[TABLE]
where for denote the eigenvalues of , and is the Dirac delta function centered at .
Definition 2
The limiting spectral density of is defined as the limit of (6) as .
Definition 3
The Stieltjes Transform (Green’s Function) of is defined as,
[TABLE]
and can be reconstructed through
[TABLE]
Definition 4
The th moment of is defined as,
[TABLE]
Definition 5
The moment generating function as a power series at zero is defined as,
[TABLE]
and its relation to the Green’s function is
[TABLE]
Definition 6
Let () be a unital algebra with a unital linear functional. Suppose are unital subalgebras, then are freely independent (or just free) [13] with respect to if whenever for and such that
- •
* for *
- •
* with for *
- •
**
Definition 7
Given the functional inverse of the moment generating function , the S-transform [14, 15] is defined as,
[TABLE]
Theorem 1
Let and are two freely invariant random matrices, the S-transform of the product is simply the product of their S-transforms
[TABLE]
3.2 Factor Model Estimation
The proposed estimation approach aims to match the LSD calculated from the modeled multiplicative covariance matrices to the ESD of covariance matrices of the real residuals that are obtained by subtracting principal components. By minimizing the distance between the two spectrums, the estimators are obtained.
The first step is to obtain the ESD of covariance matrices of the real residuals. For high-dimensional data, the principal components are able to approximately mimic all true factors [6]. Here, we use the principal components to represent factors and the real residuals are obtained by subtracting the factors from the observed data, which is defined as
[TABLE]
where is the number of factors, is an matrix which is given as eigenvectors corresponding to the largest eigenvalues of , and is an matrix which is estimated by . The covariance matrix of the real residuals can be calculated as,
[TABLE]
where the subscript indicates it is constructed from the real residuals. Thus we can obtain the ESD of , which is denoted as .
The next step is to model the covariance matrix of the real residuals. Here, we factorize into cross-covariances and auto-covariances, namely,
[TABLE]
the coefficients and are respectively collected into an cross-covariance matrix and a auto-covariance matrix , both are symmetric and positive-definite. The cross-covariance matrix is a way to model the weak spatial (cross-) correlation of the residuals, because the main spatial correlations can be effectively eliminated by removing factors (principal components). The auto-covariance matrix is used to model the temporal (auto-) correlation of the residuals. In order to obtain the LSD of , one simple way is to consider as an identity matrix and model as the covariance AR(1) matrix based on the crude assumptions that the spatial correlations of the residuals can be completely removed from factors and the residuals follow an AR(1) process. However, for the power data, a lot of measurement noise (which is usually considered to be random) is contained in the residuals and the spatial-temporal correlations of the residuals are uncertain. Here, instead of modeling and directly, we prefer approaching the LSD of through a multiplicative covariance structure with an controllable parameter ,namely,
[TABLE]
where the subscript denotes it is constructed from the modeled multiplicative covariance matrix, , is an random Gaussian matrix, and which ensures the spectral distribution of converges to a non-random limit as . The LSD of can be derived by using FPT in Section 3.3, which is denoted as .
The last step is to search for the optimal parameter set by minimizing the distance between and , which is denoted as,
[TABLE]
where is a spectral distance measure. In [11], several distance metrics are tested and Jensen-Shannon divergence is proved to be the most sensitive to the presence of spikes (i.e., the deviating eigenvalues in the spectrum) as well as correctly reflecting the distribution of the bulk (i.e., the grouped eigenvalues in the spectrum). Here, we choose Jensen-Shannon divergence as the spectral distance measure, which is a symmetrized version of Kullback-Leibler divergence and defined as,
[TABLE]
where . It can be seen that becomes smaller as approaches , and vice versa. Therefore, we can match to by minimizing , through which the optimal parameter set is obtained.
3.3 FPT for the Calculation of
As discussed in Section 3.2, is easily obtained by removing principal components from the real data, but the implementation of calculating from the Stieltjes transform for the multiplicative covariance structure is difficult. Here, FPT is used to derive the LSD of . The prescription is shown as follows:
Obtain the LSDs of , denoted as . Consider the case that involved in Eq. (17) are zero-mean with variance and , we can obtain by using the M-P law, namely,
[TABLE]
where , , and .
- 2.
Calculate the Stieltjes transform for according to Eq. (7), denoted as .
- 3.
From , deduce the corresponding moment generating function according to Eq. (11).
- 4.
From , deduce the corresponding S-transform according to Eq. (12).
- 5.
Since and are two freely invariant random matrices, according to Theorem 1, the S-transform for is calculated as,
[TABLE]
- 6.
Combine Eq. (11), (12) and (21), the polynomial equation for is obtained as (see APPENDIX for derivation details),
[TABLE]
- 7.
Obtain the limiting spectral density from through Eq. (8).
In order to approximate as much as possible, we allow an controllable parameter in the built multiplicative covariance model: the radio rate regarding . Fig. 3 illustrates the spectrum distribution of with different . For small , the spectral density resembles the M-P law. As increases, the shape of the spectrum becomes ‘thinner’ and more heavily tailed, which resembles the inverse process of continuously removing factors from the real-world online monitoring data in Section 2. By controlling and simultaneously, our approach is more flexible and accurate in estimating high-dimensional factor models.
Combining Section 3.2, the proposed factor model estimation approach is summarized as in Algorithm 1.
4 Numerical Studies
In this section, we first evaluate the performance of the proposed approach by using the synthetic data generated from Monte Carlo experiment, in which different correlation structures are set for the synthetical residuals. Then we compare the performance of our approach with that proposed by Yeo and Papanicolaou in terms of detecting weak factors and convergence rate.
4.1 Data Generation
The synthetic data is generated from the model used in Yeo and Papanicolaou’s work [11]. This model is also used in many other literatures, like Bai and Ng [1], Onatski [9], and Ahn and Horenstein [16], etc. The model is written as,
[TABLE]
where
[TABLE]
and
[TABLE]
with . The explanations for this model are as follows:
, which makes the residual level controlled only by .
- 2.
, where represents the signal-noise-radio and it is defined as .
- 3.
controls the degree of auto-correlations in the residuals.
- 4.
controls the magnitudes of cross-correlations in the residuals.
- 5.
controls the affecting ranges of the cross-correlations in the residuals. Considering the local cross-correlations can be broader with the increase of data dimensions, is usually set to be proportional to .
Combining the characteristics of the data from power system, our simulation experiments have several perspectives. Firstly, since the signal-noise-ratio for power data is usually at an extremely high level, was set to be small values in the experiments. Next, considering the main cross-correlations in the residuals can be eliminated by removing factors, was set to be much smaller than , and the effects of different combinations of them were tested. Lastly, different sample sizes were set to test the performance of the proposed approach and was set to be . Parameter configurations in the Monte Carlo experiment were shown in Table I
4.2 Performance of Our Approach
The performance of our approach was tested by using the generated data in Section 4.1. Four different residual correlation structures were set, i.e., no correlation (), auto-correlation-only (), cross(weak)-correlation-only (), auto-cross(weak)-correlation (). The true number of factors was set to be . Average values of the estimated and over simulations were shown in Table II.
It can be observed that the average estimator is almost equal to the true number of factors for a broad range of and for the cases . For the case , the number of estimated factors is about , because several weak factors caused by the weak cross-correlation of the residuals are presented. It indicates the proposed approach has powerful ability to identify weak factors. It can also be observed that the estimators become more accurate with the increase of the sample size. Meanwhile, varied correlation structures of the residuals were tested in the experiments and the corresponding examples of the fitting results of our approach for the synthetical residuals are shown in Fig. 4. controls the auto-correlation magnitude for the residuals and measures the cross-correlation within the range of in the residuals. As shown in Table II, it can be concluded that the estimator is affected both by the auto- and cross-correlations of the residuals, while the estimator is mainly affected by the cross-correlation of the residuals.
4.3 Comparison with Other Approaches
In Yeo and Papanicolaou’s work [11], the estimators from their approach are compared with the BIC3 estimator of Bai and Ng [1], the ED estimator of Onatski [9], and the ER estimator of Ahn and Horenstein [16] in detail. It shows Yeo and Papanicolaou’s approach converges the fastest when the noise level is high and has more powerful ability to identify weak factors than other methods. In this section, we mainly compare the performance of our free probability (FP) based approach with that of Yeo and Papanicolaou’s free random variable (FRV) method.
Fig. 5 shows the Jensen-Shannon (JS) divergences of and regarding the sample size and the signal-noise-radio , calculated through FRV and FP approaches, respectively. In the simulations, the true number of factors was set to be , and . Combining the characteristics of the real residuals from power data, auto-cross(weak)-correlation structure was set for the synthetical residuals, i.e., , and . As shown in the figure, the optimal JS divergences calculated though FP approach are smaller than those from FRV, which indicates that our built multiplicative covariance model can fit the residuals better than that based on FRV. What’s more, our estimation approach has a faster convergence rate than FRV, especially for the small sample size. When the sample size is large, both FRV and FP approaches converge very well, regardless of the noise levels.
5 Empirical Studies
In this section, we illustrate the proposed approach by using the real-world online monitoring data collected from a power grid and the power flow data generated from IEEE 118-bus test system. We first check how well our built model can fit the residuals from the real data. Then, implications of and are explored by using the power flow data, in which we track the evolutions of and by moving a window on the data at continuous sampling times.
5.1 Fit of Our Model to Real Data
The real-world online monitoring data are three-phase voltages collected from monitoring devices installed on the low voltage side of distribution transformers within one feeder. The data was sampled every minutes and the sampling time was from 2017/3/1 00:00:00 to 2017/3/31 23:45:00. Thus, a data set was formulated. Instead of taking the entire matrix for analysis, we moved a window on the data set at continuous sampling times. Fig. 6 shows several sample fitting results of our built multiplicative covariance model to the real residuals. It can be observed that our built multiplicative covariance model can fit the residuals well, while the M-P law does not. What’s more, it is noted that the estimated and are different for the data sampled at different sampling moments, which validates the estimators in the proposed approach can be used to indicate the system states.
5.2 Implication of
The power flow data generated from IEEE 118-bus test system [17] was used to explore the implication of . The IEEE 118-bus test system represents a portion of the U.S. Midwest Electric Power System, and it is edited into IEEE Common Data Format and PECO PSAP Format by Richard Christie from the University of Washington [18]. In the early 2000’s, researchers from the Illinois Institute Technology (IIT) work with the system and add some line characteristics [19][20]. The one-line diagram of the IEEE 118-bus test system is shown in Fig. 7. It consists of buses, branches, load sides and generators with a total installed capacity of 7220MW.
In the data generation process, a sudden change of the active load at one bus was considered as an anomaly event and a little white gaussian (WG) and autoregressive (AR(1)) noise was introduced to represent random fluctuations and measuring errors. The correlation coefficient was set to be . The anomaly events can cause the variation of the data’s cross-correlations. From Section 4.2, we know that is mainly affected by the cross-correlation of the data. Here, in order to explore the relations between the number of anomaly events and , different number of anomaly events were set, as shown in Table III. The generated data contained voltage measurement variables with sampling times, as shown in Fig. 8. Thus, a data set was formulated. In the experiment, we moved a window at continuous sampling times on the data set, which enables us to track the temporal evolutions of .
The time-series of generated with continuously moving windows is shown in Fig. 9. The relations between the number of anomaly events and the parameter are stated as follows:
I. From to , the estimated remains almost constant at . The fitting result of our built model to the residuals during this period of time (such as ) is shown in Fig. 10(a). In the experiment, no strong factors are observed during this period of time. The most likely explanation is that the proposed approach is sensitive to the weak factors caused by small fluctuations and is able to identify them effectively.
II. From to , two strong factors are observed in the experiment and the average estimated is between and , during which one anomaly event is contained in the moving window. The fitting result of our built model to the residuals during this period of time (such as ) is shown in Fig. 10(b). From to , three strong factors are observed and the average number of estimated factors is between and , during which two anomaly events are contained in the moving window. The fitting result of our built model to the residuals during this period of time (such as ) is shown in Fig. 10(c). From , four strong factors are observed and the average estimated is about , during which three anomaly events are contained in the moving window. The fitting result of our built model to the residuals during this period of time (such as ) is shown in Fig. 10(d). It can be concluded that is driven by the number of anomaly events.
III. From to , decreases by every other sampling times, because the width of the moving window is and the number of anomaly events contained in the moving window decreases by every sampling times. It validates the conclusion that is driven by the number of anomaly events.
IV. From , no strong factors are observed and remains nearly , which validates that the proposed approach is sensitive to the weak factors caused by small fluctuations.
5.3 Implication of
From Section 4.2, we know that is affected both by the cross- and auto-correlation of the data in our approach. The number of anomaly events can cause the variation of the data’s cross-correlations. In this section, we first explore how the number of anomaly events affects by using the generated data in Fig. 8. In the experiment, a window is moved on the data set at continuous sampling times and the generated curve is shown in Fig. 11(a). The relations between the number of anomaly events and are stated as follows:
I. From to , no anomaly events occur and remains almost constant.
II. From to , increases by every other sampling times for the number of anomaly events contained in the moving window increases by every sampling times. From to , decreases by every other sampling times for the number of anomaly events contained in the moving window decreases by every sampling times. It shows is positively affected by the number of anomaly events contained in the moving window, because the cross-correlations of the residuals vary with the number of anomaly events. It validates our assumption that the cross-correlation of the residuals can not be completely eliminated by removing factors, i.e., weak cross-correlation structure assumption for the residuals.
III. From , no anomaly events are contained in the moving window and returns to a constant and remains afterwards.
Meanwhile, the scale of anomaly events can affect the variation of the data’s auto-correlations. Here, we explore how the scale of anomaly events affects . Assumed events with different scales were set for bus , which was shown in Table IV. The generated data contained voltage measurements with sampling times. A window was moved on the data set at continuous sampling times and the generated curve was shown in Fig. 11(b). The relations between the scale of anomaly events and are stated as follows:
I. From to , the estimated remains almost constant, which indicates no anomaly events occur and the system operates in normal state.
II. From to , the curves are almost inverted U-shaped, because anomaly events in Table IV were set and the delay lags of the anomaly events to are equal to the moving window’s width. It is noted that the estimated corresponding to the anomaly event of the active power (AP) from to has the largest value and that of the AP from to has the smallest value, which indicates is driven by the scale of anomaly events. Because the scale of anomaly events is positively related to the variation of the auto-correlation of the residuals from the power data.
III. From , the estimated returns to constant and remains afterwards, which indicates the system has returned to normal state.
6 Conclusions
The spectrum from real-world power data is complex and cannot be trivially dissected by the M-P law. In this paper, we propose a new approach to estimate factor models by connecting the estimation of the number of factors to the ESD of covariance matrices of the residuals. Considering a lot of measurement noise is contained in the power data and the uncertain correlation structure of the real residuals, our approach prefers approaching the ESD of covariance matrices of the residuals by using a multiplicative covariance structure model, which avoids making crude assumptions or simplifications on the complex correlation structure of the data. The free probability techniques in random matrix theory is used to derive the spectral density of the multiplicative covariance structure model.
Theoretical studies show that the proposed approach is robust aganist noise and has powerful ability to identify weak factors. The built multiplicative covariance structure model can fit the ESD of covariance matrices of the real residuals better and has a faster convergence rate compared with the traditional approaches. Empirical studies show that the estimators in the proposed approach effectively characterize the number and scale of anomaly events in a power system, and they can be used to indicate the system states.
Acknowledgments
This work was partly supported by National Key R & D Program of China under Grant 2018YFF0214705, NSF of China under Grant 61571296 and (US) NSF under Grant CNS-1619250.
Let be an random matrix, whose entries are independent identically distributed (i.i.d) variables with the mean and the variance . The covariance matrix of is calculated as,
[TABLE]
As but , according to the M-P law, the spectral density of is obtained as
[TABLE]
where , , and .
According to Eq. (7), the Green’s function of is obtained as , which can be integrated into Eq. (10) to obtain the moment generating function . Solving Eq. (12) for the S-transforms given as
[TABLE]
Then the S-transform of is calculated as
[TABLE]
According to Eq. (12), the inverse function of the moment generating function is calculated as,
[TABLE]
and the moment generating function fulfills the equation
[TABLE]
By integrating Eq. (11) into Eq. (31), we can obtain,
[TABLE]
which can be simplified as
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Bai and S. Ng, “Determining the number of factors in approximate factor models,” Econometrica , vol. 70, no. 1, pp. 191–221, 2002.
- 2[2] A. Lewbel, “The rank of demand systems: theory and nonparametric estimation,” Econometrica: Journal of the Econometric Society , pp. 711–730, 1991.
- 3[3] G. Connor and R. A. Korajczyk, “A test for the number of factors in an approximate factor model,” the Journal of Finance , vol. 48, no. 4, pp. 1263–1291, 1993.
- 4[4] J. G. Cragg and S. G. Donald, “Inferring the rank of a matrix,” Journal of econometrics , vol. 76, no. 1-2, pp. 223–250, 1997.
- 5[5] M. Forni and L. Reichlin, “Let’s get real: a factor analytical approach to disaggregated business cycle dynamics,” The Review of Economic Studies , vol. 65, no. 3, pp. 453–473, 1998.
- 6[6] J. H. Stock and M. W. Watson, “Forecasting using principal components from a large number of predictors,” Journal of the American statistical association , vol. 97, no. 460, pp. 1167–1179, 2002.
- 7[7] G. Kapetanios, “A new method for determining the number of factors in factor models with large datasets,” Working Paper, Department of Economics, Queen Mary, University of London, Tech. Rep., 2004.
- 8[8] ——, “A testing procedure for determining the number of factors in approximate factor models with large datasets,” Journal of Business & Economic Statistics , vol. 28, no. 3, pp. 397–409, 2010.
