TL;DR
This paper provides a theoretical justification for the use of permutation methods like Parallel Analysis in PCA and factor analysis, showing they reliably identify large components in high-dimensional models but have limitations with smaller ones.
Contribution
It offers the first theoretical analysis of permutation methods, demonstrating their consistency for large components and revealing their limitations with smaller components.
Findings
Permutation methods reliably select large components in high-dimensional models.
Permutation methods do not effectively identify smaller components.
Theoretical justification for the use of permutation methods in PCA and factor analysis.
Abstract
Researchers often have datasets measuring features of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly scrambling each feature of the data. It selects components if their singular values are larger than those of the permuted data. Despite widespread use in leading textbooks and scientific publications, as well as empirical evidence for its accuracy, it currently has no theoretical justification. In this paper, we show that the parallel analysis permutation method consistently selects the large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
