
TL;DR
This paper reviews semi-supervised clustering methods that incorporate additional information beyond feature data, mainly modifications of k-means, to improve clustering when some label or outcome information is available.
Contribution
It provides a comprehensive overview of semi-supervised clustering algorithms, detailing their modifications of k-means and other approaches for leveraging extra information.
Findings
Most methods are based on modifications of k-means
Semi-supervised methods improve clustering accuracy when label info is available
The review covers various algorithms and their applications
Abstract
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
