MML Probabilistic Principal Component Analysis

Enes Makalic; Daniel F. Schmidt

arXiv:2209.14559·stat.ME·February 10, 2026

MML Probabilistic Principal Component Analysis

Enes Makalic, Daniel F. Schmidt

PDF

Open Access

TL;DR

This paper introduces a Bayesian minimum message length method for automatically determining the number of principal components in PCA, improving residual variance estimation and extending to mixture models.

Contribution

It presents a novel Bayesian approach for selecting PCA components and enhances residual variance estimation, with extensions to mixture models.

Findings

01

Improved residual variance estimation over maximum likelihood

02

Automatic component selection based on Bayesian MML

03

Extension to finite mixture models of PCA

Abstract

Principal component analysis (PCA) is perhaps the most widely used method for data dimensionality reduction. A key question in PCA is deciding how many factors to retain. This manuscript describes a new approach to automatically selecting the number of principal components based on the Bayesian minimum message length method of inductive inference. We derive a new estimate of the isotropic residual variance and demonstrate that it improves on the usual maximum likelihood approach. We also discuss extending this approach to finite mixture models of principal component analyzers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpectroscopy and Chemometric Analyses · Face and Expression Recognition · Blind Source Separation Techniques