Automatic dimensionality selection for principal component analysis models with the ignorance score
Stefania Russo, Guangyu Li, Kris Villez

TL;DR
This paper introduces an automatic method for selecting the optimal number of principal components in PCA models using the ignorance score, enhancing model tuning with a probabilistic interpretation.
Contribution
It proposes a novel application of the ignorance score for PCA model selection, providing a practical tool for more reliable component number determination.
Findings
Ignorance score effectively identifies optimal PCA components.
Method validated on simulated and real datasets.
Improves PCA model tuning without extensive manual intervention.
Abstract
Principal component analysis (PCA) is by far the most widespread tool for unsupervised learning with high-dimensional data sets. Its application is popularly studied for the purpose of exploratory data analysis and online process monitoring. Unfortunately, fine-tuning PCA models and particularly the number of components remains a challenging task. Today, this selection is often based on a combination of guiding principles, experience, and process understanding. Unlike the case of regression, where cross-validation of the prediction error is a widespread and trusted approach for model selection, there are no tools for PCA model selection which reach this level of acceptance. In this work, we address this challenge and evaluate the utility of the cross-validated ignorance score with both simulated and experimental data sets. Application of this method is based on the interpretation of PCA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Machine Learning in Materials Science · Computational Drug Discovery Methods
