Asymptotic Theory of Principal Component Analysis for Time Series Data with Cautionary Comments
Xinyu Zhang, Howell Tong

TL;DR
This paper examines the theoretical underpinnings of PCA when applied to time series data, highlighting potential pitfalls, providing new asymptotic results, and emphasizing careful interpretation of loadings in practical applications.
Contribution
It establishes a central limit theorem for PCA eigenvalues and eigenvectors in time series, and offers methods for accurate inference and interpretation of PCA results under dependence.
Findings
Proportion of variation is robust to dependence assumptions.
Inference of PC loadings requires careful attention.
Empirical example demonstrates correct PCA usage in portfolio management.
Abstract
Principal component analysis (PCA) is a most frequently used statistical tool in almost all branches of data science. However, like many other statistical tools, there is sometimes the risk of misuse or even abuse. In this paper, we highlight possible pitfalls in using the theoretical results of PCA based on the assumption of independent data when the data are time series. For the latter, we state with proof a central limit theorem of the eigenvalues and eigenvectors (loadings), give direct and bootstrap estimation of their asymptotic covariances, and assess their efficacy via simulation. Specifically, we pay attention to the proportion of variation, which decides the number of principal components (PCs), and the loadings, which help interpret the meaning of PCs. Our findings are that while the proportion of variation is quite robust to different dependence assumptions, the inference of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Geochemistry and Geologic Mapping · Complex Systems and Time Series Analysis
