Confidence Intervals for the Number of Components in Factor Analysis and Principal Components Analysis via Subsampling
Chetkar Jha, Ian Barnett

TL;DR
This paper introduces a subsampling-based method to construct confidence intervals for the number of components in factor analysis and PCA, addressing uncertainty in estimating the component count.
Contribution
It proposes a novel data-driven approach with theoretical guarantees for confidence intervals of component numbers in FA and PCA, including the first Edgeworth expansion for spiked eigenvalues.
Findings
The method achieves accurate coverage probabilities in simulations.
It provides reliable confidence intervals for real genotyping data.
Theoretical analysis supports the method's validity.
Abstract
Factor analysis (FA) and principal component analysis (PCA) are popular statistical methods for summarizing and explaining the variability in multivariate datasets. By default, FA and PCA assume the number of components or factors to be known \emph{a priori}. However, in practice the users first estimate the number of factors or components and then perform FA and PCA analyses using the point estimate. Therefore, in practice the users ignore any uncertainty in the point estimate of the number of factors or components. For datasets where the uncertainty in the point estimate is not ignorable, it is prudent to perform FA and PCA analyses for the range of positive integer values in the confidence intervals for the number of factors or components. We address this problem by proposing a subsampling-based data-intensive approach for estimating confidence intervals for the number of components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Fractal and DNA sequence analysis
