Principal component analysis in econometrics: a selective inference perspective

Yasuyuki Matsumura; Chisato Tachibana

arXiv:2511.10419·econ.EM·December 12, 2025

Principal component analysis in econometrics: a selective inference perspective

Yasuyuki Matsumura, Chisato Tachibana

PDF

Open Access

TL;DR

This paper introduces a new data-driven method for determining the number of principal components in econometrics, leveraging selective inference to provide asymptotically accurate tests without relying on Gaussian assumptions.

Contribution

It develops a sequential testing procedure for estimating the true rank of the covariance matrix, treating the design as random and extending prior fixed-design methods.

Findings

01

Asymptotically exact type I error control under the null hypothesis

02

Empirical validation shows the method's effectiveness

03

Applicable to high-dimensional econometric data

Abstract

We study the long-standing problem of determining the number of principal components in econometric applications from a selective inference perspective. We consider i.i.d. observations from a $p$ -dimensional random vector with $p < n$ and define the ``true'' dimensionality as the rank of the population covariance matrix. Building on the sequential testing viewpoint, we propose a data-driven procedure that estimates $\rank (Σ_{X})$ using a statistic that depends on the eigenvalues of the sample covariance matrix. While the test statistic shares the functional form of its fixed design counterpart Choi et al. (2017), our analysis departs from the non-stochastic setting by treating the design as random and by avoiding parametric Gaussian assumptions. Under a locally defined null hypothesis, we establish asymptotically exact type~I error controls in the sequential testing procedure, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Random Matrices and Applications · Statistical Methods and Bayesian Inference