Principal component analysis in econometrics: a selective inference perspective
Yasuyuki Matsumura, Chisato Tachibana

TL;DR
This paper introduces a new data-driven method for determining the number of principal components in econometrics, leveraging selective inference to provide asymptotically accurate tests without relying on Gaussian assumptions.
Contribution
It develops a sequential testing procedure for estimating the true rank of the covariance matrix, treating the design as random and extending prior fixed-design methods.
Findings
Asymptotically exact type I error control under the null hypothesis
Empirical validation shows the method's effectiveness
Applicable to high-dimensional econometric data
Abstract
We study the long-standing problem of determining the number of principal components in econometric applications from a selective inference perspective. We consider i.i.d. observations from a -dimensional random vector with and define the ``true'' dimensionality as the rank of the population covariance matrix. Building on the sequential testing viewpoint, we propose a data-driven procedure that estimates using a statistic that depends on the eigenvalues of the sample covariance matrix. While the test statistic shares the functional form of its fixed design counterpart Choi et al. (2017), our analysis departs from the non-stochastic setting by treating the design as random and by avoiding parametric Gaussian assumptions. Under a locally defined null hypothesis, we establish asymptotically exact type~I error controls in the sequential testing procedure, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Random Matrices and Applications · Statistical Methods and Bayesian Inference
