An Information-Theoretic External Cluster-Validity Measure

Byron E Dom

arXiv:1301.0565·cs.LG·January 7, 2013·28 cites

An Information-Theoretic External Cluster-Validity Measure

Byron E Dom

PDF

Open Access

TL;DR

This paper introduces an information-theoretic measure for evaluating clustering quality by comparing cluster assignments with known class labels, capable of handling different numbers of clusters and quantifying predictive usefulness.

Contribution

It proposes a novel external cluster-validity measure based on information theory that can compare clusterings with varying numbers of clusters in a principled way.

Findings

01

Measure reduces to mutual information when cluster counts are equal.

02

Quantifies the predictive power of cluster labels for class labels.

03

Provides a model-based encoding approach for assessing clustering quality.

Abstract

In this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with ``ground truth' consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are refered to as ``external'. Our measure also has the characteristic of allowing clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. In cases where all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Management and Algorithms