Data clustering using stochastic block models

Nina Mrzelj; Pavlin Gregor Poli\v{c}ar

arXiv:1707.07494·cs.SI·July 25, 2017·1 cites

Data clustering using stochastic block models

Nina Mrzelj, Pavlin Gregor Poli\v{c}ar

PDF

Open Access

TL;DR

This paper explores the application of generalized stochastic block models to data clustering, demonstrating their potential advantages over traditional methods like k-means and highlighting their performance on weighted graphs.

Contribution

It introduces a generalized stochastic block model for clustering weighted graphs and compares its effectiveness to existing clustering techniques.

Findings

01

SBM-based methods outperform k-means in community detection tasks

02

Generalized SBM can handle weighted graphs effectively

03

SBM approaches do not require pre-specifying the number of clusters

Abstract

It has been shown that community detection algorithms work better for clustering tasks than other, more popular methods, such as k-means. In fact, network analysis based methods often outperform more widely used methods and do not suffer from some of the drawbacks we notice elsewhere e.g. the number of clusters k usually has to be known in advance. However, stochastic block models which are known to perform well for community detection, have not yet been tested for this task. We discuss why these models cannot be directly applied to this problem and test the performance of a generalization of stochastic block models which work on weighted graphs and compare them to other clustering techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Bioinformatics and Genomic Networks