A Short Survey on Data Clustering Algorithms

Ka-Chun Wong

arXiv:1511.09123·cs.DS·December 1, 2015

A Short Survey on Data Clustering Algorithms

Ka-Chun Wong

PDF

TL;DR

This paper reviews various data clustering algorithms, discussing their design, methodologies, and evaluation metrics, providing insights into current advancements and future directions in clustering research.

Contribution

It offers a comprehensive survey of state-of-the-art clustering algorithms, including paradigms, advanced methods, and evaluation metrics, highlighting recent developments and future research opportunities.

Findings

01

Summarizes different clustering paradigms and methodologies.

02

Reviews existing clustering evaluation metrics.

03

Provides insights into future research directions.

Abstract

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.