# Infinite Mixtures of Infinite Factor Analysers

**Authors:** Keefe Murphy, Cinzia Viroli, and Isobel Claire Gormley

arXiv: 1701.07010 · 2021-07-15

## TL;DR

The paper introduces IMIFA, a flexible Bayesian model for clustering high-dimensional data that automatically infers the number of clusters and factors, eliminating the need for pre-specification and extensive model selection.

## Contribution

IMIFA is the first model to combine infinite mixtures with infinite factor analysis, allowing automatic inference of cluster and factor numbers with improved performance.

## Key findings

- IMIFA automatically infers the number of clusters and factors.
- IMIFA outperforms traditional models in clustering accuracy.
- IMIFA reduces computational burden and quantifies uncertainty.

## Abstract

Factor-analytic Gaussian mixture models are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be specified in advance of model fitting, and remain fixed. The pair which optimises some model selection criterion is then chosen. For computational reasons, models in which the number of latent factors differ across clusters are rarely considered. Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Furthermore, IMIFA employs multiplicative gamma process shrinkage priors to allow cluster-specific numbers of factors, automatically inferred via an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixture models, providing flexible approaches to clustering high-dimensional data. Applications to a benchmark data set, metabolomic spectral data, and a manifold learning handwritten digit example illustrate the IMIFA model and its advantageous features. These include obviating the need for model selection criteria, reducing the computational burden associated with the search of the model space, improving clustering performance by allowing cluster-specific numbers of factors, and quantifying uncertainty in the numbers of clusters and cluster-specific factors.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.07010/full.md

## Figures

35 figures with captions in the complete paper: https://tomesphere.com/paper/1701.07010/full.md

## References

87 references — full list in the complete paper: https://tomesphere.com/paper/1701.07010/full.md

---
Source: https://tomesphere.com/paper/1701.07010