Estimation of high-dimensional factor models and its application in   power data analysis

Xin Shi; Robert Qiu

arXiv:1905.02061·stat.AP·October 22, 2019

Estimation of high-dimensional factor models and its application in power data analysis

Xin Shi, Robert Qiu

PDF

Open Access

TL;DR

This paper introduces a novel spectral density-based method for estimating high-dimensional factor models in power data, effectively handling noise and complex residual structures using free probability theory.

Contribution

It proposes a new approach that estimates the number of factors and residual correlation structure without crude assumptions, leveraging spectral density and free probability theory.

Findings

01

Method is robust against noise.

02

Sensitive to weak factors.

03

Validated with IEEE 118-bus power system data.

Abstract

In dealing with high-dimensional data, factor models are often used for reducing dimensions and extracting relevant information. The spectrum of covariance matrices from power data exhibits two aspects: 1) bulk, which arises from random noise or fluctuations and 2) spikes, which represents factors caused by anomaly events. In this paper, we propose a new approach to the estimation of high-dimensional factor models, minimizing the distance between the empirical spectral density (ESD) of covariance matrices of the residuals of power data that are obtained by subtracting principal components and the limiting spectral density (LSD) from a multiplicative covariance structure model. The free probability theory (FPT) is used to derive the spectral density of the multiplicative covariance model, which efficiently solves the computational difficulties. The proposed approach connects the…

Tables5

Algorithm 1. Procedure of factor model estimation
Input: The observed data matrix $𝑹 \in ℝ^{N \times T}$ .
Output: The estimated number of factors $\hat{p}$ , and the ratio rate $\hat{ϕ}$ .
1: For the number of removed factors $p = 0, 1, 2, \dots$ 2: Obtain the real residual ${\hat{U}}^{(p)}$ through Eq. (14). 3: Normalize ${\hat{U}}^{(p)}$ into the standard form through Eq. (5). 4: Calculate the covariance matrix of the standardized residual through Eq. (15), i.e., $𝚺_{r e a l}^{(p)}$ . 5: For the ratio rate $ϕ \sim U (0, 1]$ 6: Calculate $ρ_{m o d e l} (ϕ)$ according to the prescriptions in Section 3.3. 7: Calculate the spectral distance $𝒟 (ρ_{r e a l} (p) \| \| ρ_{m o d e l} (ϕ))$ through Eq. (19) and save the result in each iteration. 8: End for 9: End for 10: Obtain the optimal parameter set ${\hat{p}, \hat{ϕ}}$ through Eq. (18).

Table 2. TABLE I: Parameter Configurations in the Monte Carlo Experiment.

Sample sizes	$N, T$	{50,100,200,300,500}
Number of factors	$p$	{2,3,4}
1/SNR	$γ$	{1/10000,1/1000,1/100,1/10,1} $\times p$
Correlations in residuals	$(α, β, J)$	{(0,0,0),(0.5,0,0),(0,0.05, $N$ /10),(0.5,0.05, $N$ /10)}

Table 3. TABLE II: Average p ^ ^ 𝑝 \hat{p} and ϕ ^ ^ italic-ϕ \hat{\phi} Over 1000 Simulations.

$N, T$	$S N R$	$α, β = (0.0, 0.0)$		$α, β = (0.5, 0.0)$		$α, β = (0, 0.05)$		$α, β = (0.5, 0.05)$
$N, T$	$S N R$	$\hat{p}$	$\hat{ϕ}$	$\hat{p}$	$\hat{ϕ}$	$\hat{p}$	$\hat{ϕ}$	$\hat{p}$	$\hat{ϕ}$
$100$	$1$	3.000	0.5851	3.000	0.7405	10.010	0.6395	2.948	0.7564
$100$	$10$	3.000	0.5910	2.998	0.7435	10.000	0.6366	3.000	0.7534
$100$	$100$	3.000	0.6019	3.010	0.7366	10.045	0.6494	3.061	0.7682
$100$	$1000$	3.006	0.5930	3.007	0.7415	10.047	0.6831	2.924	0.7484
$100$	$10000$	3.011	0.5999	3.033	0.7435	10.045	0.6702	3.199	0.7257
$200$	$1$	3.000	0.5772	3.099	0.7524	10.030	0.6399	3.017	0.7445
$200$	$10$	3.000	0.5801	3.031	0.7524	10.005	0.6380	3.274	0.7484
$200$	$100$	3.000	0.5811	2.900	0.7583	10.031	0.6330	3.101	0.7544
$200$	$1000$	3.000	0.5801	3.000	0.7564	10.010	0.6399	3.382	0.7494
$200$	$10000$	3.002	0.5891	3.045	0.7425	10.023	0.6380	3.300	0.7405
$300$	$1$	3.000	0.6366	3.000	0.7187	10.003	0.6633	3.000	0.7405
$300$	$10$	3.000	0.6247	2.998	0.7088	10.000	0.6534	2.996	0.7474
$300$	$100$	3.000	0.6286	3.002	0.7316	10.000	0.6435	3.132	0.7465
$300$	$1000$	3.000	0.6207	2.999	0.7227	10.003	0.6593	2.946	0.7395
$300$	$10000$	3.000	0.6336	3.000	0.7118	10.005	0.6583	3.161	0.7286
$500$	$1$	3.000	0.5841	3.000	0.7653	10.000	0.6310	3.000	0.7702
$500$	$10$	3.000	0.5712	3.000	0.7613	10.005	0.6310	3.099	0.7663
$500$	$100$	3.000	0.5782	2.998	0.7603	10.000	0.6390	3.099	0.7732
$500$	$1000$	3.000	0.5792	3.010	0.7712	10.000	0.6320	3.000	0.7603
$500$	$10000$	3.000	0.5722	3.004	0.7672	10.001	0.6300	3.099	0.7752

Table 4. TABLE III: Assumed Signals for Active Load of Bus 20, 30 and 60.

Bus	Sampling Time	Active Load(MW)
20	$t_{s} = 1 \sim 500$	$20 + \frac{1}{S N R} (W G + A R (1))$
20	$t_{s} = 501 \sim 1000$	$80 + \frac{1}{S N R} (W G + A R (1))$
30	$t_{s} = 1 \sim 550$	$20 + \frac{1}{S N R} (W G + A R (1))$
30	$t_{s} = 551 \sim 1000$	$80 + \frac{1}{S N R} (W G + A R (1))$
60	$t_{s} = 1 \sim 600$	$20 + \frac{1}{S N R} (W G + A R (1))$
60	$t_{s} = 601 \sim 1000$	$80 + \frac{1}{S N R} (W G + A R (1))$
Others	$t_{s} = 1 \sim 1000$	Unchanged

Table 5. TABLE IV: Assumed Anomaly Events with Different Scales Set for Bus 20.

Bus	Sampling Time	Active Power(MW)
20	$t_{s} = 1 \sim 500$	$20 + \frac{1}{S N R} (W G + A R (1))$
20	$t_{s} = 501 \sim 1000$	$100 / 200 / 300 + \frac{1}{S N R} (W G + A R (1))$
Others	$t_{s} = 1 \sim 1000$	Unchanged

Equations68

Δ R

Δ R

= J^{- 1} Δ S + J^{- 1} G

R = Λ F + U

R = Λ F + U

\begin{aligned} {f_{MP}}(x)=\left\{\begin{array}[]{l}\frac{1}{{2\pi c{\sigma}^{2}}x}\sqrt{(b-x)(x-a)}{\rm{,}}\quad a\leq x\leq b\\ 0,\qquad\qquad\qquad\qquad\qquad\quad{\rm{others}}\end{array}\right.\end{aligned},

\begin{aligned} {f_{MP}}(x)=\left\{\begin{array}[]{l}\frac{1}{{2\pi c{\sigma}^{2}}x}\sqrt{(b-x)(x-a)}{\rm{,}}\quad a\leq x\leq b\\ 0,\qquad\qquad\qquad\qquad\qquad\quad{\rm{others}}\end{array}\right.\end{aligned},

R_{i t} = j = 1 \sum p Λ_{ij} F_{j t} + U_{i t}

R_{i t} = j = 1 \sum p Λ_{ij} F_{j t} + U_{i t}

\overset{u}{^}_{ij} = (u_{ij} - μ (u_{i})) \times \frac{σ ( u ^ _{i} )}{σ ( u _{i} )} + μ (\hat{u}_{i}),

\overset{u}{^}_{ij} = (u_{ij} - μ (u_{i})) \times \frac{σ ( u ^ _{i} )}{σ ( u _{i} )} + μ (\hat{u}_{i}),

ρ_{X} (λ) = \frac{1}{n} i = 1 \sum n δ (λ - λ_{i} (X))

ρ_{X} (λ) = \frac{1}{n} i = 1 \sum n δ (λ - λ_{i} (X))

G_{X} (z) = \int_{R} \frac{ρ _{X} ( λ )}{z - λ} d λ

G_{X} (z) = \int_{R} \frac{ρ _{X} ( λ )}{z - λ} d λ

ρ_{X} (λ) = - \frac{1}{π} ε \to 0^{+} lim ℑ G_{X} (λ + i ε) .

ρ_{X} (λ) = - \frac{1}{π} ε \to 0^{+} lim ℑ G_{X} (λ + i ε) .

m_{X, k} = \frac{1}{n} < T r X^{k} >= \int ρ_{X} (λ) λ^{k} d λ .

m_{X, k} = \frac{1}{n} < T r X^{k} >= \int ρ_{X} (λ) λ^{k} d λ .

M_{X} (z) = k = 1 \sum \infty m_{X, k} z^{k}

M_{X} (z) = k = 1 \sum \infty m_{X, k} z^{k}

M_{X} (z) = \frac{1}{z} G_{X} (\frac{1}{z}) - 1

M_{X} (z) = \frac{1}{z} G_{X} (\frac{1}{z}) - 1

S_{X} (z) = \frac{1 + z}{z} M_{X}^{- 1} (z)

S_{X} (z) = \frac{1 + z}{z} M_{X}^{- 1} (z)

S_{AB} (z) = S_{A} (z) S_{B} (z)

S_{AB} (z) = S_{A} (z) S_{B} (z)

\hat{U}^{(p)} = R - \hat{L}^{(p)} \hat{F}^{(p)}

\hat{U}^{(p)} = R - \hat{L}^{(p)} \hat{F}^{(p)}

Σ_{r e a l}^{(p)} = \frac{1}{T} \hat{U}^{(p)} \hat{U}^{(p)^{T}}

Σ_{r e a l}^{(p)} = \frac{1}{T} \hat{U}^{(p)} \hat{U}^{(p)^{T}}

Σ_{r e a l, (ia, j b)}^{(p)} = C_{ij} A_{ab}

Σ_{r e a l, (ia, j b)}^{(p)} = C_{ij} A_{ab}

Σ_{m o d e l} = Σ_{0} Σ_{1}

Σ_{m o d e l} = Σ_{0} Σ_{1}

{\overset{p}{^}, \hat{ϕ}} = arg p, ϕ min D (ρ_{r e a l} (p), ρ_{m o d e l} (ϕ))

{\overset{p}{^}, \hat{ϕ}} = arg p, ϕ min D (ρ_{r e a l} (p), ρ_{m o d e l} (ϕ))

D (ρ_{r e a l} ∣∣ ρ_{m o d e l}) = \frac{1}{2} i \sum ρ_{r e a l}^{(i)} lo g \frac{ρ _{r e a l}^{(i)}}{ρ ^{(i)}}

D (ρ_{r e a l} ∣∣ ρ_{m o d e l}) = \frac{1}{2} i \sum ρ_{r e a l}^{(i)} lo g \frac{ρ _{r e a l}^{(i)}}{ρ ^{(i)}}

+ \frac{1}{2} i \sum ρ_{m o d e l}^{(i)} lo g \frac{ρ _{m o d e l}^{(i)}}{ρ ^{(i)}}

ρ_{Σ_{i}} (λ) = \frac{1}{2 π ϕ λ} (b - λ)^{+} (λ - a)^{+}

ρ_{Σ_{i}} (λ) = \frac{1}{2 π ϕ λ} (b - λ)^{+} (λ - a)^{+}

S_{Σ_{0} Σ_{1}} (z) = S_{Σ_{0}} (z) S_{Σ_{1}} (z) = \frac{1}{( 1 + ϕ z ) ^{2}}

S_{Σ_{0} Σ_{1}} (z) = S_{Σ_{0}} (z) S_{Σ_{1}} (z) = \frac{1}{( 1 + ϕ z ) ^{2}}

ϕ^{2} z^{2} G^{3} + 2 (1 - ϕ) ϕ z G^{2} + (ϕ^{2} - 2 ϕ + 1 - z) G + 1 = 0

ϕ^{2} z^{2} G^{3} + 2 (1 - ϕ) ϕ z G^{2} + (ϕ^{2} - 2 ϕ + 1 - z) G + 1 = 0

R_{i t} = j = 1 \sum p Λ_{ij} F_{j t} + γ U_{i t}

R_{i t} = j = 1 \sum p Λ_{ij} F_{j t} + γ U_{i t}

U_{i t} = \frac{1 - α ^{2}}{1 + 2 J β ^{2}} e_{i t}

U_{i t} = \frac{1 - α ^{2}}{1 + 2 J β ^{2}} e_{i t}

e_{i t} = α e_{i, t - 1} + v_{i t} + h = ma x (i - J, 1) \sum i - 1 β v_{h t} + h = i + 1 \sum min (i + J, N) β v_{h t}

e_{i t} = α e_{i, t - 1} + v_{i t} + h = ma x (i - J, 1) \sum i - 1 β v_{h t} + h = i + 1 \sum min (i + J, N) β v_{h t}

Σ_{i} = \frac{1}{n} G_{i} G_{i}^{T}

Σ_{i} = \frac{1}{n} G_{i} G_{i}^{T}

ρ_{Σ_{i}} (λ) = \frac{1}{2 π ϕ λ} (b - λ)^{+} (λ - a)^{+}

ρ_{Σ_{i}} (λ) = \frac{1}{2 π ϕ λ} (b - λ)^{+} (λ - a)^{+}

S_{Σ_{i}} (z) = \frac{1}{1 + ϕ z}

S_{Σ_{i}} (z) = \frac{1}{1 + ϕ z}

S_{Σ_{0} Σ_{1}} (z) = S_{Σ_{0}} (z) S_{Σ_{1}} (z) = \frac{1}{( 1 + ϕ z ) ^{2}}

S_{Σ_{0} Σ_{1}} (z) = S_{Σ_{0}} (z) S_{Σ_{1}} (z) = \frac{1}{( 1 + ϕ z ) ^{2}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Statistical Methods and Inference · Bayesian Methods and Mixture Models

Full text

Estimation of high-dimensional factor models

and its application in power data analysis

Xin Shi, Robert Qiu X. Shi is with the Center for Big Data and Artificial Intelligence, Shanghai Jiao Tong University, Shanghai 200240, China.

E-mail: [email protected] R. Qiu is with the Center for Big Data and Artificial Intelligence, Shanghai Jiao Tong University, Shanghai 200240, China. E-mail: [email protected]

Abstract

In dealing with high-dimensional data, factor models are often used for reducing dimensions and extracting relevant information. The spectrum of covariance matrices from power data exhibits two aspects: 1) bulk, which arises from random noise or fluctuations and 2) spikes, which represents factors caused by anomaly events. In this paper, we propose a new approach to the estimation of high-dimensional factor models, minimizing the distance between the empirical spectral density (ESD) of covariance matrices of the residuals of power data that are obtained by subtracting principal components and the limiting spectral density (LSD) from a multiplicative covariance structure model. The free probability theory (FPT) is used to derive the spectral density of the multiplicative covariance model, which efficiently solves the computational difficulties. The proposed approach connects the estimation of the number of factors to the LSD of covariance matrices of the residuals, which provides estimators of the number of factors and the correlation structure information in the residuals. Considering a lot of measurement noise is contained in the power data and the correlation structure is complex for the residuals, the approach prefers approaching the ESD of covariance matrices of the residuals through a multiplicative covariance model, which avoids making crude assumptions or simplifications on the complex structure of the data. Theoretical studies show the proposed approach is robust against noise and sensitive to the presence of weak factors. The synthetic data from IEEE 118-bus power system is used to validate the effectiveness of the approach. Furthermore, the application to the analysis of the real-world online monitoring data in a power grid shows that the estimators in the approach can be used to indicate the system behavior.

Index Terms:

high-dimensional data, factor model estimation, principal components, multiplicative covariance structure, free probability theory, power data

1 Introduction

Factor models are important tools for reducing the dimensionality of the observed data and extracting the relevant information. They are used for modeling a large number of variables through a small number of unobserved variables to be estimated in many applications. With the emergence of big data in many fields, especially the increasing data dimensionality, extensive studies on the estimation of high-dimensional factor models have been conducted.

Bai and Ng [1] proposes using information criteria for estimating the number of factors, which is developed under the framework of high data dimensions ( $n$ ), seriously different from the previous methods [2, 3, 4, 5] developed under the assumption that the data dimension is fixed or small. A critical assumption made in the work is the factors’ cumulative effect on $n$ grows proportionally to $n$ . Stock and Watson [6] suggests using principal components for estimating factors in high-dimensional datasets. Kapetanios [7, 8] first proposes exploiting a structure of residual terms in the approximate factor models. Based on Kapetanios’s work, Onatski [9] relaxes the restrictions on the covariance structure of the residual terms and develops a new consistent estimator for estimating the number of factors. Harding [10] imposes restrictions on the spatial-temporal correlation patterns of the residual terms, and proposes an estimation method for the number of factors by relating the moments of the empirical spectral density (ESD) of covariance matrices of the observed data to the parameters regarding the spatial-temporal correlations. Yeo and Papanicolaou [11] presents a new approach to estimate the number of factors by connecting the factor model estimation problem to the limiting spectral density (LSD) of covariance matrices of the residuals, in which two strict assumptions are made: one is the spatial correlation of the real residuals can be completely eliminated by removing the estimated number of factors; the other is the residuals follow an AR(1) process.

1.1 Contributions and Paper Organization

Based on the previous work, in this paper, instead of modeling the structure of the residuals directly, we propose approaching the LSD of covariance matrices of the residuals through a multiplicative covariance structure model with an controllable parameter. It avoids making crude assumptions on the structure of the data residuals and allows the proposed approach being more flexible and practical in analyzing the real-world data. Take the power flow data for example, the classical physical model in matrix form is as follows,

[TABLE]

where $\Delta$ denotes the variations of regarding variables and ${\bm{J}}^{-1}$ is the inverse of the Jacobian matrix. $\bm{R}$ is the observed data (e.g., voltage amplitude and phase angle), $\bm{S}$ are considered as the signals (e.g., active and reactive power), and $\bm{G}$ represents small random fluctuations or measuring errors. Since a lot of measurement noise is contained in the residual term ${\bm{J}}^{-1}\bm{G}$ and the spatial-temporal correlations among its entries are complex, it is impossible to model the residuals from power data directly without any assumptions and simplifications.

Inspired by the idea of decomposing the observed data into systemic components (factors) and idiosyncratic components (residuals), we consider an approximate factor model for $N$ variables and $T$ observations as follows,

[TABLE]

where $\bm{R}$ is an $N\times T$ observed data matrix, $\bm{\Lambda}$ is an $N\times p$ ( $p$ is the number of factors) factor loading matrix, $\bm{F}$ is an $p\times T$ matrix of factors, and $\bm{U}$ is an $N\times T$ residual matrix.

One simple way to estimate $\bm{\Lambda}\bm{F}$ is using the principal components and assuming $\bm{U}$ as pure noise. However, our approach mainly focuses on $\bm{U}$ and we estimate the number of factors and the ESD of covariance matrix of $\bm{U}$ simultaneously. The main advantages of the proposed approach can be summarized as follows:

•

It relaxes restrictions on the structure of the residuals $\bm{U}$ . pure noise or just temporal-correlation assumption for the residuals $\bm{U}$ is crude and unreasonable in practice. Instead of modeling $\bm{U}$ with strict structure item, the proposed approach prefers approaching the ESD of covariance matrix of $\bm{U}$ through a multiplicative covariance structure model with an controllable parameter, which makes the approach more flexible and practical.

•

The proposed approach uses free probability techniques in RMT to derive the LSD of the built multiplicative covariance model, which greatly simplifies the calculation process and ensures the efficiency of the approach.

•

It relates the estimation of the number of factors to the ESD of covariance matrix of $\bm{U}$ , which allows controlling both the number of factors and the spectral shape of the residuals.

•

The theoretical studies on the synthetic data generated from Monte Carlo experiments show the proposed approach is robust against noise and sensitive to the weak factors, and the built multiplicative covariance structure can fit the ESD of covariance matrices of the auto-cross(weak)-correlation structure residuals better than the AR(1) model in Yeo and Papanicolaou’s approach.

•

By using the power data generated from IEEE 118-bus test system, the estimators in the proposed approach are proved to be sensitive in indicating the number and scale of anomaly events occurred in the power system.

•

With the real-world online monitoring data from a power grid, the estimators in the proposed approach are found to be successful in indicating the system states.

The rest of this paper is organized as follows. In Section 2, we apply the Marchenko-Pastur law for the residuals from both synthetic data and real-world power data. In Section 3, we present our approach for the estimation of high-dimensional factor models. In Section 4, by using the synthetic data generated from Monte Carlo experiment, we evaluate the performance of our approach and compare it with that developed by Yeo and Papanicolaou in terms of detecting weak factors and convergence rate. Section 5 shows the applications of our approach to power data analysis. In Section 6, conclusions are presented.

2 Motivation Example

Marchenko-Pastur law (M-P law): Let ${\bm{X}}=\{{x}_{i,j}\}$ be an $N\times T$ random matrix, whose entries are independent identically distributed (i.i.d.) variables with the mean $\mu(x)=0$ and the variance $\sigma^{2}(x)<\infty$ . The corresponding covariance matrix is defined as ${\bm{\Sigma}}=\frac{1}{T}{\bm{X}}{\bm{X}}^{H}$ . As $N,T\to\infty$ but $c=\frac{N}{T}\in(0,1]$ , according to the M-P law [12], the ESD of ${\bm{\Sigma}}$ converges to the limit with probability density function (PDF)

[TABLE]

where $a={\sigma}^{2}{(1-\sqrt{c})}^{2}$ , $b={\sigma}^{2}{(1+\sqrt{c})}^{2}$ .

In this section, we first apply the M-P law for the residuals from the synthetic data generated by the following model,

[TABLE]

where $\bm{\Lambda}_{ij}\sim N(0,1)$ , $\bm{F}_{jt}\sim N(0,0.01)$ , and $\bm{U}_{it}\sim N(0,1)$ are independent. The true number of factors $p$ is set to be 4. As is shown in Fig. 1, with the factors removed continuously, the ESD of covariance matrices of the residuals converges to the M-P law.

In contrast, we apply the M-P law for the residuals from the real-world online monitoring data in a power grid. Let matrix $\bm{R}$ be the sampling data with $N=189,T=672$ , and $\bm{U}$ is the residual matrix obtained by subtracting principal components from $\bm{R}$ . We convert $\bm{U}$ into the standard form $\hat{\bm{U}}$ through

[TABLE]

where ${\bm{u}}_{i}=(u_{i1},u_{i2},...)$ , $\mu({\hat{\bm{u}}}_{i})=0$ , and $\sigma({\hat{\bm{u}}}_{i})=1$ . As is shown in Fig. 2, no matter how many factors are removed, the ESD of covariance matrices of the residuals from the real-world data does not fit to the M-P law. Therefore, it is necessary to build a new model to fit the ESD from real residuals in estimating factor models.

3 FPT Based Factor Model Estimation

In this section, we propose an approach for the estimation of high-dimensional factor models. In Section 3.1, we provide preliminaries that will be used in the proposed approach. In Section 3.2, we introduce a new factor model estimation approach, which connects the estimation of the number of factors to the ESD of covariance matrices of the residuals. Considering a lot of measurement noise is contained in the residuals and the complex correlation structure of the residuals from power data, an approaching way is proposed for calculating the LSD of covariance matrices of the residuals. Specific steps of the proposed approach are given in Section 3.3, in which FPT is used for deriving the spectral density of the built multiplicative covariance structure model.

3.1 Preliminaries

Definition 1

For a random matrix ${\bm{X}}\in\mathbb{R}^{n\times n}$ , the empirical spectral density of $\bm{X}$ is defined as,

[TABLE]

where $\lambda_{i}(\bm{X})$ for $i=1,2,\cdots,n$ denote the eigenvalues of $\bm{X}$ , and $\delta(x)$ is the Dirac delta function centered at $x$ .

Definition 2

The limiting spectral density of $\bm{X}$ is defined as the limit of (6) as $n\rightarrow\infty$ .

Definition 3

The Stieltjes Transform (Green’s Function) of $\rho_{\bm{X}}(\lambda)$ is defined as,

[TABLE]

and $\rho_{\bm{X}}(\lambda)$ can be reconstructed through

[TABLE]

Definition 4

The $k-$ th moment of $\bm{X}$ is defined as,

[TABLE]

Definition 5

The moment generating function as a power series at zero is defined as,

[TABLE]

and its relation to the Green’s function is

[TABLE]

Definition 6

Let ( $\mathcal{A},\phi$ ) be a unital algebra with a unital linear functional. Suppose $\mathcal{A}_{1},\cdots,\mathcal{A}_{s}$ are unital subalgebras, then $\mathcal{A}_{1},\cdots,\mathcal{A}_{s}$ are freely independent (or just free) [13] with respect to $\phi$ if whenever for $r\geq 2$ and $a_{1},\cdots,a_{r}\in\mathcal{A}$ such that

•

$\phi(a_{i})=0$ * for $i=1,2,\cdots,r$ *

•

$a_{i}\in\mathcal{A}_{j_{i}}$ * with $1\leq{j_{i}}\leq s$ for $i=1,2,\cdots,r$ *

•

$j_{1}\neq j_{2},j_{2}\neq j_{3},\cdots,j_{r-1}\neq j_{r}$ **

Definition 7

Given the functional inverse of the moment generating function $M_{\bm{X}}^{-1}(z)$ , the S-transform [14, 15] is defined as,

[TABLE]

Theorem 1

Let $\bm{A}$ and $\bm{B}$ are two freely invariant random matrices, the S-transform of the product $\bm{AB}$ is simply the product of their S-transforms

[TABLE]

3.2 Factor Model Estimation

The proposed estimation approach aims to match the LSD calculated from the modeled multiplicative covariance matrices to the ESD of covariance matrices of the real residuals that are obtained by subtracting principal components. By minimizing the distance between the two spectrums, the estimators are obtained.

The first step is to obtain the ESD of covariance matrices of the real residuals. For high-dimensional data, the principal components are able to approximately mimic all true factors [6]. Here, we use the principal components to represent factors and the real residuals are obtained by subtracting the factors from the observed data, which is defined as

[TABLE]

where $p$ is the number of factors, ${\hat{\bm{F}}}^{(p)}$ is an $p\times T$ matrix which is given as eigenvectors corresponding to the $p$ largest eigenvalues of ${\bm{R}}^{T}{\bm{R}}$ , and ${\hat{\bm{L}}}^{(p)}$ is an $N\times p$ matrix which is estimated by ${\bm{R}}{{\hat{\bm{F}}}^{{(p)}^{-1}}}$ . The covariance matrix of the real residuals can be calculated as,

[TABLE]

where the subscript $real$ indicates it is constructed from the real residuals. Thus we can obtain the ESD of ${\bm{\Sigma}}_{real}^{(p)}$ , which is denoted as $\rho_{real}(p)$ .

The next step is to model the covariance matrix of the real residuals. Here, we factorize ${\bm{\Sigma}}_{real}^{(p)}$ into cross-covariances and auto-covariances, namely,

[TABLE]

the coefficients $C_{ij}(i,j=1,\cdots,N)$ and $A_{ab}(a,b=1,\cdots,T)$ are respectively collected into an $N\times N$ cross-covariance matrix $\bm{C}$ and a $T\times T$ auto-covariance matrix $\bm{A}$ , both are symmetric and positive-definite. The cross-covariance matrix $\bm{C}$ is a way to model the weak spatial (cross-) correlation of the residuals, because the main spatial correlations can be effectively eliminated by removing $p$ factors (principal components). The auto-covariance matrix $\bm{A}$ is used to model the temporal (auto-) correlation of the residuals. In order to obtain the LSD of ${\bm{\Sigma}}_{real}^{(p)}$ , one simple way is to consider $\bm{C}$ as an identity matrix ${\bm{I}}_{N}$ and model $\bm{A}$ as the covariance AR(1) matrix based on the crude assumptions that the spatial correlations of the residuals can be completely removed from $p$ factors and the residuals follow an AR(1) process. However, for the power data, a lot of measurement noise (which is usually considered to be random) is contained in the residuals and the spatial-temporal correlations of the residuals are uncertain. Here, instead of modeling $\bm{C}$ and $\bm{A}$ directly, we prefer approaching the LSD of ${\bm{\Sigma}}_{real}^{(p)}$ through a multiplicative covariance structure with an controllable parameter $\phi$ ,namely,

[TABLE]

where the subscript $model$ denotes it is constructed from the modeled multiplicative covariance matrix, $\bm{\Sigma}_{i}={\bm{G}_{i}}{\bm{G}_{i}}^{T}/n\;(i=0,1)$ , $\bm{G}_{i}$ is an $m\times n$ random Gaussian matrix, and $\phi=\frac{m}{n}\in(0,1]$ which ensures the spectral distribution of ${\bm{\Sigma}}_{model}$ converges to a non-random limit as $m,n\rightarrow\infty$ . The LSD of ${\bm{\Sigma}}_{model}$ can be derived by using FPT in Section 3.3, which is denoted as $\rho_{model}(\phi)$ .

The last step is to search for the optimal parameter set $(p,\phi)$ by minimizing the distance between $\rho_{real}(p)$ and $\rho_{model}(\phi)$ , which is denoted as,

[TABLE]

where $\mathcal{D}$ is a spectral distance measure. In [11], several distance metrics are tested and Jensen-Shannon divergence is proved to be the most sensitive to the presence of spikes (i.e., the deviating eigenvalues in the spectrum) as well as correctly reflecting the distribution of the bulk (i.e., the grouped eigenvalues in the spectrum). Here, we choose Jensen-Shannon divergence as the spectral distance measure, which is a symmetrized version of Kullback-Leibler divergence and defined as,

[TABLE]

where ${\rho}=\frac{{\rho_{real}}+{\rho_{model}}}{2}$ . It can be seen that $\mathcal{D}({\rho_{real}}||{\rho_{model}})$ becomes smaller as $\rho_{real}$ approaches $\rho_{model}$ , and vice versa. Therefore, we can match $\rho_{model}(\phi)$ to $\rho_{real}(p)$ by minimizing $\mathcal{D}$ , through which the optimal parameter set $({\hat{p}},{\hat{\phi}})$ is obtained.

3.3 FPT for the Calculation of $\bf\rho_{model}(\phi)$

As discussed in Section 3.2, $\rho_{real}(p)$ is easily obtained by removing $p$ principal components from the real data, but the implementation of calculating $\rho_{model}(\phi)$ from the Stieltjes transform for the multiplicative covariance structure ${\bm{\Sigma}}_{0}{\bm{\Sigma}}_{1}$ is difficult. Here, FPT is used to derive the LSD of ${\bm{\Sigma}}_{0}{\bm{\Sigma}}_{1}$ . The prescription is shown as follows:

Obtain the LSDs of ${\bm{\Sigma}_{i}}\;(i=0,1)$ , denoted as $\rho_{\bm{\Sigma}_{i}}(\lambda)$ . Consider the case that $\{g_{jk}\}_{m\times n}$ involved in Eq. (17) are zero-mean with variance $1$ and $\phi\in(0,1]$ , we can obtain $\rho_{\bm{\Sigma}_{i}}(\lambda)$ by using the M-P law, namely,

[TABLE]

where $(\lambda)^{+}=max(0,\lambda)$ , $a={(1-\sqrt{\phi})}^{2}$ , and $b={(1+\sqrt{\phi})}^{2}$ .

2.

Calculate the Stieltjes transform for $\rho_{\bm{\Sigma}_{i}}(\lambda)$ according to Eq. (7), denoted as $G_{\bm{\Sigma}_{i}}(z)$ .

3.

From $G_{\bm{\Sigma}_{i}}(z)$ , deduce the corresponding moment generating function $M_{\bm{\Sigma}_{i}}(z)$ according to Eq. (11).

4.

From $M_{\bm{\Sigma}_{i}}(z)$ , deduce the corresponding S-transform $S_{\bm{\Sigma}_{i}}(z)$ according to Eq. (12).

5.

Since ${\bm{\Sigma}}_{0}$ and ${\bm{\Sigma}}_{1}$ are two freely invariant random matrices, according to Theorem 1, the S-transform for ${\bm{\Sigma}}_{0}{\bm{\Sigma}}_{1}$ is calculated as,

[TABLE]

6.

Combine Eq. (11), (12) and (21), the polynomial equation for $G\equiv G_{\bm{\Sigma}_{0}\bm{\Sigma}_{1}}(z)$ is obtained as (see APPENDIX for derivation details),

[TABLE]

7.

Obtain the limiting spectral density $\rho_{\bm{\Sigma}_{0}\bm{\Sigma}_{1}}(\lambda)$ from $G_{\bm{\Sigma}_{0}\bm{\Sigma}_{1}}(z)$ through Eq. (8).

In order to approximate $\rho_{real}(p)$ as much as possible, we allow an controllable parameter in the built multiplicative covariance model: the radio rate $\phi=m/n\in(0,1]$ regarding $\bm{G}_{i}$ . Fig. 3 illustrates the spectrum distribution of ${\bm{\Sigma}_{0}\bm{\Sigma}_{1}}$ with different $\phi$ . For small $\phi$ , the spectral density resembles the M-P law. As $\phi$ increases, the shape of the spectrum becomes ‘thinner’ and more heavily tailed, which resembles the inverse process of continuously removing factors from the real-world online monitoring data in Section 2. By controlling $p$ and $\phi$ simultaneously, our approach is more flexible and accurate in estimating high-dimensional factor models.

Combining Section 3.2, the proposed factor model estimation approach is summarized as in Algorithm 1.

4 Numerical Studies

In this section, we first evaluate the performance of the proposed approach by using the synthetic data generated from Monte Carlo experiment, in which different correlation structures are set for the synthetical residuals. Then we compare the performance of our approach with that proposed by Yeo and Papanicolaou in terms of detecting weak factors and convergence rate.

4.1 Data Generation

The synthetic data is generated from the model used in Yeo and Papanicolaou’s work [11]. This model is also used in many other literatures, like Bai and Ng [1], Onatski [9], and Ahn and Horenstein [16], etc. The model is written as,

[TABLE]

where

[TABLE]

and

[TABLE]

with $\bm{v}_{it},\bm{\Lambda}_{ij},\bm{F}_{jt}\sim N(0,1)(i=1,2,\cdots,N;t=1,2,\cdots,T)$ . The explanations for this model are as follows:

$var(\bm{U}_{it})\equiv 1$ , which makes the residual level controlled only by $\gamma$ .

2.

$\gamma=\frac{1}{SNR}p$ , where $SNR$ represents the signal-noise-radio and it is defined as $SNR=\frac{var(Factors)}{var(Residuals)}=\frac{p}{\gamma}$ .

3.

$\alpha(\alpha<1)$ controls the degree of auto-correlations in the residuals.

4.

$\beta(\beta<1)$ controls the magnitudes of cross-correlations in the residuals.

5.

$J$ controls the affecting ranges of the cross-correlations in the residuals. Considering the local cross-correlations can be broader with the increase of data dimensions, $J$ is usually set to be proportional to $N$ .

Combining the characteristics of the data from power system, our simulation experiments have several perspectives. Firstly, since the signal-noise-ratio for power data is usually at an extremely high level, $\gamma$ was set to be small values in the experiments. Next, considering the main cross-correlations in the residuals can be eliminated by removing factors, $\alpha$ was set to be much smaller than $\beta$ , and the effects of different combinations of them were tested. Lastly, different sample sizes were set to test the performance of the proposed approach and $J$ was set to be $N/10$ . Parameter configurations in the Monte Carlo experiment were shown in Table I

4.2 Performance of Our Approach

The performance of our approach was tested by using the generated data in Section 4.1. Four different residual correlation structures were set, i.e., no correlation ( $(\alpha,\beta)=(0.0,0.0)$ ), auto-correlation-only ( $(\alpha,\beta)=(0.5,0.0)$ ), cross(weak)-correlation-only ( $(\alpha,\beta)=(0,0.05)$ ), auto-cross(weak)-correlation ( $(\alpha,\beta)=(0.5,0.05)$ ). The true number of factors was set to be $3$ . Average values of the estimated $\hat{p}$ and $\hat{\phi}$ over $1000$ simulations were shown in Table II.

It can be observed that the average estimator $\hat{p}$ is almost equal to the true number of factors for a broad range of $N$ and $SNR$ for the cases $(\alpha,\beta)=(0.0,0.0),(0.5,0.0),(0.5,0.05)$ . For the case $(\alpha,\beta)=(0.0,0.05)$ , the number of estimated factors is about $10$ , because several weak factors caused by the weak cross-correlation of the residuals are presented. It indicates the proposed approach has powerful ability to identify weak factors. It can also be observed that the estimators become more accurate with the increase of the sample size. Meanwhile, varied correlation structures of the residuals were tested in the experiments and the corresponding examples of the fitting results of our approach for the synthetical residuals are shown in Fig. 4. $\alpha$ controls the auto-correlation magnitude for the residuals and $\beta$ measures the cross-correlation within the range of $J$ in the residuals. As shown in Table II, it can be concluded that the estimator $\hat{\phi}$ is affected both by the auto- and cross-correlations of the residuals, while the estimator $\hat{p}$ is mainly affected by the cross-correlation of the residuals.

4.3 Comparison with Other Approaches

In Yeo and Papanicolaou’s work [11], the estimators from their approach are compared with the BIC3 estimator of Bai and Ng [1], the ED estimator of Onatski [9], and the ER estimator of Ahn and Horenstein [16] in detail. It shows Yeo and Papanicolaou’s approach converges the fastest when the noise level is high and has more powerful ability to identify weak factors than other methods. In this section, we mainly compare the performance of our free probability (FP) based approach with that of Yeo and Papanicolaou’s free random variable (FRV) method.

Fig. 5 shows the Jensen-Shannon (JS) divergences of $\rho_{syn}(\hat{p})$ and $\rho_{model}(\hat{\phi})$ regarding the sample size $N$ and the signal-noise-radio $SNR$ , calculated through FRV and FP approaches, respectively. In the simulations, the true number of factors was set to be $3$ , and $T=N$ . Combining the characteristics of the real residuals from power data, auto-cross(weak)-correlation structure was set for the synthetical residuals, i.e., $(\alpha,\beta)=(0.5,0.05)$ , and $J=N/10$ . As shown in the figure, the optimal JS divergences calculated though FP approach are smaller than those from FRV, which indicates that our built multiplicative covariance model can fit the residuals better than that based on FRV. What’s more, our estimation approach has a faster convergence rate than FRV, especially for the small sample size. When the sample size is large, both FRV and FP approaches converge very well, regardless of the noise levels.

5 Empirical Studies

In this section, we illustrate the proposed approach by using the real-world online monitoring data collected from a power grid and the power flow data generated from IEEE 118-bus test system. We first check how well our built model can fit the residuals from the real data. Then, implications of $\hat{p}$ and $\hat{\phi}$ are explored by using the power flow data, in which we track the evolutions of $\hat{p}$ and $\hat{\phi}$ by moving a window on the data at continuous sampling times.

5.1 Fit of Our Model to Real Data

The real-world online monitoring data are three-phase voltages collected from $63$ monitoring devices installed on the low voltage side of distribution transformers within one feeder. The data was sampled every $15$ minutes and the sampling time was from 2017/3/1 00:00:00 to 2017/3/31 23:45:00. Thus, a $189\times 2976$ data set was formulated. Instead of taking the entire matrix for analysis, we moved a $189\times 672$ window on the data set at continuous sampling times. Fig. 6 shows several sample fitting results of our built multiplicative covariance model to the real residuals. It can be observed that our built multiplicative covariance model can fit the residuals well, while the M-P law does not. What’s more, it is noted that the estimated $\hat{p}$ and $\hat{\phi}$ are different for the data sampled at different sampling moments, which validates the estimators in the proposed approach can be used to indicate the system states.

5.2 Implication of $\hat{p}$

The power flow data generated from IEEE 118-bus test system [17] was used to explore the implication of $\hat{p}$ . The IEEE 118-bus test system represents a portion of the U.S. Midwest Electric Power System, and it is edited into IEEE Common Data Format and PECO PSAP Format by Richard Christie from the University of Washington [18]. In the early 2000’s, researchers from the Illinois Institute Technology (IIT) work with the system and add some line characteristics [19][20]. The one-line diagram of the IEEE 118-bus test system is shown in Fig. 7. It consists of $118$ buses, $186$ branches, $91$ load sides and $54$ generators with a total installed capacity of 7220MW.

In the data generation process, a sudden change of the active load at one bus was considered as an anomaly event and a little white gaussian (WG) and autoregressive (AR(1)) noise was introduced to represent random fluctuations and measuring errors. The correlation coefficient was set to be $0.5$ . The anomaly events can cause the variation of the data’s cross-correlations. From Section 4.2, we know that $\hat{p}$ is mainly affected by the cross-correlation of the data. Here, in order to explore the relations between the number of anomaly events and $\hat{p}$ , different number of anomaly events were set, as shown in Table III. The generated data contained $118$ voltage measurement variables with sampling $1000$ times, as shown in Fig. 8. Thus, a $118\times 1000$ data set was formulated. In the experiment, we moved a $118\times 200$ window at continuous sampling times on the data set, which enables us to track the temporal evolutions of $\hat{p}$ .

The time-series of $\hat{p}$ generated with continuously moving windows is shown in Fig. 9. The relations between the number of anomaly events and the parameter $\hat{p}$ are stated as follows:

I. From $t_{s}=200$ to $t_{s}=500$ , the estimated $\hat{p}$ remains almost constant at $1$ . The fitting result of our built model to the residuals during this period of time (such as $t_{s}=500$ ) is shown in Fig. 10(a). In the experiment, no strong factors are observed during this period of time. The most likely explanation is that the proposed approach is sensitive to the weak factors caused by small fluctuations and is able to identify them effectively.

II. From $t_{s}=501$ to $t_{s}=550$ , two strong factors are observed in the experiment and the average estimated $\hat{p}$ is between $2$ and $3$ , during which one anomaly event is contained in the moving window. The fitting result of our built model to the residuals during this period of time (such as $t_{s}=501$ ) is shown in Fig. 10(b). From $t_{s}=551$ to $t_{s}=600$ , three strong factors are observed and the average number of estimated factors is between $3$ and $4$ , during which two anomaly events are contained in the moving window. The fitting result of our built model to the residuals during this period of time (such as $t_{s}=551$ ) is shown in Fig. 10(c). From $t_{s}=601\sim 650$ , four strong factors are observed and the average estimated $\hat{p}$ is about $4$ , during which three anomaly events are contained in the moving window. The fitting result of our built model to the residuals during this period of time (such as $t_{s}=601$ ) is shown in Fig. 10(d). It can be concluded that $\hat{p}$ is driven by the number of anomaly events.

III. From $t_{s}=651$ to $t_{s}=800$ , $\hat{p}$ decreases by $1$ every other $50$ sampling times, because the width of the moving window is $200$ and the number of anomaly events contained in the moving window decreases by $1$ every $50$ sampling times. It validates the conclusion that $\hat{p}$ is driven by the number of anomaly events.

IV. From $t_{s}=801$ , no strong factors are observed and $\hat{p}$ remains nearly $1$ , which validates that the proposed approach is sensitive to the weak factors caused by small fluctuations.

5.3 Implication of $\hat{\phi}$

From Section 4.2, we know that $\hat{\phi}$ is affected both by the cross- and auto-correlation of the data in our approach. The number of anomaly events can cause the variation of the data’s cross-correlations. In this section, we first explore how the number of anomaly events affects $\hat{\phi}$ by using the generated data in Fig. 8. In the experiment, a $118\times 200$ window is moved on the data set at continuous sampling times and the generated ${\hat{\phi}}-t$ curve is shown in Fig. 11(a). The relations between the number of anomaly events and $\hat{\phi}$ are stated as follows:

I. From $t_{s}=200$ to $t_{s}=500$ , no anomaly events occur and $\hat{\phi}$ remains almost constant.

II. From $t_{s}=501$ to $t_{s}=650$ , $\hat{\phi}$ increases by $0.005$ every other $50$ sampling times for the number of anomaly events contained in the moving window increases by $1$ every $50$ sampling times. From $t_{s}=651$ to $t_{s}=800$ , $\hat{\phi}$ decreases by $0.005$ every other $50$ sampling times for the number of anomaly events contained in the moving window decreases by $1$ every $50$ sampling times. It shows $\hat{\phi}$ is positively affected by the number of anomaly events contained in the moving window, because the cross-correlations of the residuals vary with the number of anomaly events. It validates our assumption that the cross-correlation of the residuals can not be completely eliminated by removing factors, i.e., weak cross-correlation structure assumption for the residuals.

III. From $t_{s}=801$ , no anomaly events are contained in the moving window and $\hat{\phi}$ returns to a constant and remains afterwards.

Meanwhile, the scale of anomaly events can affect the variation of the data’s auto-correlations. Here, we explore how the scale of anomaly events affects $\hat{\phi}$ . Assumed events with different scales were set for bus $20$ , which was shown in Table IV. The generated data contained $118$ voltage measurements with sampling $1000$ times. A $118\times 200$ window was moved on the data set at continuous sampling times and the generated $\hat{\phi}-t$ curve was shown in Fig. 11(b). The relations between the scale of anomaly events and $\hat{\phi}$ are stated as follows:

I. From $t_{s}=200$ to $t_{s}=500$ , the estimated $\hat{b}$ remains almost constant, which indicates no anomaly events occur and the system operates in normal state.

II. From $t_{s}=501$ to $t_{s}=700$ , the ${\hat{b}}-t$ curves are almost inverted U-shaped, because anomaly events in Table IV were set and the delay lags of the anomaly events to $\hat{\phi}$ are equal to the moving window’s width. It is noted that the estimated $\hat{b}$ corresponding to the anomaly event of the active power (AP) from $20$ to $300$ has the largest value and that of the AP from $20$ to $100$ has the smallest value, which indicates $\hat{\phi}$ is driven by the scale of anomaly events. Because the scale of anomaly events is positively related to the variation of the auto-correlation of the residuals from the power data.

III. From $t_{s}=701$ , the estimated $\hat{b}$ returns to constant and remains afterwards, which indicates the system has returned to normal state.

6 Conclusions

The spectrum from real-world power data is complex and cannot be trivially dissected by the M-P law. In this paper, we propose a new approach to estimate factor models by connecting the estimation of the number of factors to the ESD of covariance matrices of the residuals. Considering a lot of measurement noise is contained in the power data and the uncertain correlation structure of the real residuals, our approach prefers approaching the ESD of covariance matrices of the residuals by using a multiplicative covariance structure model, which avoids making crude assumptions or simplifications on the complex correlation structure of the data. The free probability techniques in random matrix theory is used to derive the spectral density of the multiplicative covariance structure model.

Theoretical studies show that the proposed approach is robust aganist noise and has powerful ability to identify weak factors. The built multiplicative covariance structure model can fit the ESD of covariance matrices of the real residuals better and has a faster convergence rate compared with the traditional approaches. Empirical studies show that the estimators in the proposed approach effectively characterize the number and scale of anomaly events in a power system, and they can be used to indicate the system states.

Acknowledgments

This work was partly supported by National Key R & D Program of China under Grant 2018YFF0214705, NSF of China under Grant 61571296 and (US) NSF under Grant CNS-1619250.

Let ${\bm{G}}_{i}=\{g_{jk}\}\;(i=0,1)$ be an $m\times n$ random matrix, whose entries are independent identically distributed (i.i.d) variables with the mean $\mu(g)=0$ and the variance $\sigma^{2}(g)=1$ . The covariance matrix of ${\bm{G}}_{i}$ is calculated as,

[TABLE]

As $m,n\rightarrow\infty$ but $\phi=\frac{m}{n}\in(0,1]$ , according to the M-P law, the spectral density of ${\bm{\sum}_{i}}$ is obtained as

[TABLE]

where $(\lambda)^{+}=max(0,\lambda)$ , $a={(1-\sqrt{\phi})}^{2}$ , and $b={(1+\sqrt{\phi})}^{2}$ .

According to Eq. (7), the Green’s function of $\rho_{\bm{\Sigma}_{i}}(\lambda)$ is obtained as $G_{\bm{\Sigma}_{i}}(z)$ , which can be integrated into Eq. (10) to obtain the moment generating function $m_{\bm{\Sigma}_{i}}(z)$ . Solving Eq. (12) for the S-transforms given as

[TABLE]

Then the S-transform of ${\bm{\Sigma}_{0}}{\bm{\Sigma}_{1}}$ is calculated as

[TABLE]

According to Eq. (12), the inverse function of the moment generating function $m_{\bm{\Sigma}_{0}\bm{\Sigma}_{1}}^{-1}(z)$ is calculated as,

[TABLE]

and the moment generating function $m_{\bm{\Sigma}_{0}\bm{\Sigma}_{1}}(z)$ fulfills the equation

[TABLE]

By integrating Eq. (11) into Eq. (31), we can obtain,

[TABLE]

which can be simplified as

[TABLE]

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Bai and S. Ng, “Determining the number of factors in approximate factor models,” Econometrica , vol. 70, no. 1, pp. 191–221, 2002.
2[2] A. Lewbel, “The rank of demand systems: theory and nonparametric estimation,” Econometrica: Journal of the Econometric Society , pp. 711–730, 1991.
3[3] G. Connor and R. A. Korajczyk, “A test for the number of factors in an approximate factor model,” the Journal of Finance , vol. 48, no. 4, pp. 1263–1291, 1993.
4[4] J. G. Cragg and S. G. Donald, “Inferring the rank of a matrix,” Journal of econometrics , vol. 76, no. 1-2, pp. 223–250, 1997.
5[5] M. Forni and L. Reichlin, “Let’s get real: a factor analytical approach to disaggregated business cycle dynamics,” The Review of Economic Studies , vol. 65, no. 3, pp. 453–473, 1998.
6[6] J. H. Stock and M. W. Watson, “Forecasting using principal components from a large number of predictors,” Journal of the American statistical association , vol. 97, no. 460, pp. 1167–1179, 2002.
7[7] G. Kapetanios, “A new method for determining the number of factors in factor models with large datasets,” Working Paper, Department of Economics, Queen Mary, University of London, Tech. Rep., 2004.
8[8] ——, “A testing procedure for determining the number of factors in approximate factor models with large datasets,” Journal of Business & Economic Statistics , vol. 28, no. 3, pp. 397–409, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Estimation of high-dimensional factor models

Abstract

Index Terms:

1 Introduction

1.1 Contributions and Paper Organization

2 Motivation Example

3 FPT Based Factor Model Estimation

3.1 Preliminaries

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Definition 7

Theorem 1

3.2 Factor Model Estimation

3.3 FPT for the Calculation of ρmodel(ϕ)\bf\rho_{model}(\phi)ρmodel​(ϕ)

4 Numerical Studies

4.1 Data Generation

4.2 Performance of Our Approach

4.3 Comparison with Other Approaches

5 Empirical Studies

5.1 Fit of Our Model to Real Data

5.2 Implication of p^\hat{p}p^​

5.3 Implication of ϕ^\hat{\phi}ϕ^​

6 Conclusions

Acknowledgments

3.3 FPT for the Calculation of $\bf\rho_{model}(\phi)$

5.2 Implication of $\hat{p}$

5.3 Implication of $\hat{\phi}$