Mixture of von Mises-Fisher distribution with sparse prototypes
Fabrice Rossi (CEREMADE), Florian Barbaro (SAMM)

TL;DR
This paper introduces a sparse, l1-penalized mixture of von Mises-Fisher distributions with an EM algorithm, enhancing clustering interpretability for high-dimensional directional data like texts and financial reports.
Contribution
It proposes a novel sparse estimation method for von Mises-Fisher mixtures using l1 penalty and develops an EM algorithm with a path following approach.
Findings
Improved clustering interpretability with sparse prototypes
Demonstrated advantages on simulated and real benchmark data
Effective exploratory analysis on financial reports
Abstract
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
