Semi-supervised clustering methods

Eric Bair

arXiv:1307.0252·stat.ME·July 11, 2014

Semi-supervised clustering methods

Eric Bair

PDF

TL;DR

This paper reviews semi-supervised clustering methods that incorporate additional information beyond feature data, mainly modifications of k-means, to improve clustering when some label or outcome information is available.

Contribution

It provides a comprehensive overview of semi-supervised clustering algorithms, detailing their modifications of k-means and other approaches for leveraging extra information.

Findings

01

Most methods are based on modifications of k-means

02

Semi-supervised methods improve clustering accuracy when label info is available

03

The review covers various algorithms and their applications

Abstract

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering