Clustering -- Basic concepts and methods

Jan-Oliver Felix Kapp-Joswig; Bettina G. Keller

arXiv:2212.01248·cs.LG·December 5, 2022·1 cites

Clustering -- Basic concepts and methods

Jan-Oliver Felix Kapp-Joswig, Bettina G. Keller

PDF

Open Access

TL;DR

This paper provides an introductory review of clustering techniques, discussing fundamental concepts, data preparation, various algorithms, and validation methods for clustering analysis.

Contribution

It offers a comprehensive overview of clustering methods, including connectivity-based, prototype-based, and density-based approaches, with insights into their implementation and validation.

Findings

01

Comparison of different clustering algorithms

02

Discussion on data representation and preparation

03

Overview of validation techniques for clustering results

Abstract

We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task? And how can clustering results be validated? Connectivity-based versus prototype-based approaches are reflected in the context of several popular methods: single-linkage, spectral embedding, k-means, and Gaussian mixtures are discussed as well as the density-based protocols (H)DBSCAN, Jarvis-Patrick, CommonNN, and density-peaks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research