# Revisiting clustering as matrix factorisation on the Stiefel manifold

**Authors:** St\'ephane Chr\'etien, Benjamin Guedj

arXiv: 1903.04479 · 2021-12-16

## TL;DR

This paper introduces a novel Bayesian clustering method based on matrix factorization on the Stiefel manifold, providing theoretical guarantees and an efficient sampling algorithm for high-dimensional data.

## Contribution

It reformulates clustering as low rank matrix estimation on the Stiefel manifold using Burer-Monteiro factorization, with new prediction bounds and a Langevin sampler.

## Key findings

- Proposes a new Bayesian estimator with theoretical prediction bounds.
- Develops a Langevin sampler for efficient computation.
- Demonstrates effectiveness on high-dimensional data.

## Abstract

This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.04479/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1903.04479/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1903.04479/full.md

---
Source: https://tomesphere.com/paper/1903.04479