Validation of nonlinear PCA

Matthias Scholz

arXiv:1204.0684·cs.LG·April 4, 2012

Validation of nonlinear PCA

Matthias Scholz

PDF

TL;DR

This paper introduces a novel validation method for nonlinear PCA using missing data estimation error, effectively selecting optimal model complexity where traditional methods fail.

Contribution

It proposes a new validation approach based on missing data prediction error for nonlinear PCA, overcoming limitations of standard validation techniques.

Findings

01

The new validation method accurately identifies the optimal model complexity.

02

Standard validation techniques tend to overfit nonlinear PCA models.

03

The approach improves model selection in unsupervised learning contexts.

Abstract

Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.