Sparse Bayesian Unsupervised Learning

Stephane Gaiffas; Bertrand Michel

arXiv:1401.8017·stat.ML·February 3, 2014·5 cites

Sparse Bayesian Unsupervised Learning

Stephane Gaiffas, Bertrand Michel

PDF

Open Access

TL;DR

This paper introduces a Bayesian method for variable selection and clustering in high-dimensional unsupervised learning, utilizing constrained Gaussian mixture models with a sparsity prior, and demonstrates its theoretical optimality and efficient implementation.

Contribution

It develops a novel Bayesian framework for simultaneous variable selection and clustering, with proven optimality and a fast Metropolis-Hastings algorithm for practical use.

Findings

01

Proves a sparsity oracle inequality for the method.

02

Demonstrates fast convergence of the algorithm.

03

Effectively selects the number of clusters and relevant variables.

Abstract

This paper is about variable selection, clustering and estimation in an unsupervised high-dimensional setting. Our approach is based on fitting constrained Gaussian mixture models, where we learn the number of clusters $K$ and the set of relevant variables $S$ using a generalized Bayesian posterior with a sparsity inducing prior. We prove a sparsity oracle inequality which shows that this procedure selects the optimal parameters $K$ and $S$ . This procedure is implemented using a Metropolis-Hastings algorithm, based on a clustering-oriented greedy proposal, which makes the convergence to the posterior very fast.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Face and Expression Recognition · Speech Recognition and Synthesis