Efficient mixture model for clustering of sparse high dimensional binary   data

Marek \'Smieja; Krzysztof Hajto; Jacek Tabor

arXiv:1707.03157·cs.LG·July 12, 2017

Efficient mixture model for clustering of sparse high dimensional binary data

Marek \'Smieja, Krzysztof Hajto, Jacek Tabor

PDF

1 Repo

TL;DR

SparseMix is a novel clustering model tailored for sparse high-dimensional binary data, combining model-based and centroid-based approaches, and employing an efficient online optimization for improved performance.

Contribution

The paper introduces SparseMix, a new mixture model that efficiently clusters sparse binary data and automatically reduces unnecessary clusters, outperforming existing methods.

Findings

01

SparseMix achieves higher compatibility with reference groupings.

02

Constructed representatives better reveal data structure.

03

Efficient online Hartigan optimization enhances scalability.

Abstract

In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on EM algorithm, SparseMix: -is especially designed for the processing of sparse data, -can be efficiently realized by an on-line Hartigan optimization algorithm, -is able to automatically reduce unnecessary clusters. We perform extensive experimental studies on various types of data, which confirm that SparseMix builds partitions with higher compatibility with reference grouping than related methods. Moreover, constructed representatives often better reveal the internal structure of data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hajtos/SparseMIX
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.