Mixture of von Mises-Fisher distribution with sparse prototypes

Fabrice Rossi (CEREMADE); Florian Barbaro (SAMM)

arXiv:2212.14591·cs.LG·January 2, 2023

Mixture of von Mises-Fisher distribution with sparse prototypes

Fabrice Rossi (CEREMADE), Florian Barbaro (SAMM)

PDF

TL;DR

This paper introduces a sparse, l1-penalized mixture of von Mises-Fisher distributions with an EM algorithm, enhancing clustering interpretability for high-dimensional directional data like texts and financial reports.

Contribution

It proposes a novel sparse estimation method for von Mises-Fisher mixtures using l1 penalty and develops an EM algorithm with a path following approach.

Findings

01

Improved clustering interpretability with sparse prototypes

02

Demonstrated advantages on simulated and real benchmark data

03

Effective exploratory analysis on financial reports

Abstract

Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.