Determining the Number of Components in PLS Regression on Incomplete Data
Titin Agustin Nengsih, Fr\'ed\'eric Bertrand, Myriam Maumy-Bertrand, and Nicolas Meyer

TL;DR
This paper investigates how to determine the optimal number of components in PLS regression when data has missing values, comparing different criteria and imputation methods to improve model selection reliability.
Contribution
It provides a comparative analysis of component selection criteria for PLS regression on incomplete data, highlighting the effectiveness of Q2-based methods over AIC and BIC.
Findings
Q2-leave-one-out method is more reliable for component selection.
Different missing data proportions affect the performance of selection criteria.
Imputation methods influence the accuracy of component number estimation.
Abstract
Partial least squares regression---or PLS---is a multivariate method in which models are estimated using either the SIMPLS or NIPALS algorithm. PLS regression has been extensively used in applied research because of its effectiveness in analysing relationships between an outcome and one or several components. Note that the NIPALS algorithm is able to provide estimates on incomplete data. Selection of the number of components used to build a representative model in PLS regression is an important problem. However, how to deal with missing data when using PLS regression remains a matter of debate. Several approaches have been proposed in the literature, including the criterion, and the AIC and BIC criteria. Here we study the behavior of the NIPALS algorithm when used to fit a PLS regression for various proportions of missing data and for different types of missingness. We compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses
