Power to the Points: Validating Data Memberships in Clusterings
Parasaran Raman, Suresh Venkatasubramanian

TL;DR
This paper introduces a general, efficient method to validate individual data point labels in clustering results by assigning affinity scores, enhancing trust and interpretability in exploratory data analysis.
Contribution
The authors propose a novel, versatile approach to compute affinity scores for cluster labels, applicable across various data types and incorporating importance functions.
Findings
Affinity scores effectively measure confidence in point assignments.
Method is computationally efficient, polynomial in the number of clusters.
Experimental results demonstrate practical utility and visualization benefits.
Abstract
A clustering is an implicit assignment of labels of points, based on proximity to other points. It is these labels that are then used for downstream analysis (either focusing on individual clusters, or identifying representatives of clusters and so on). Thus, in order to trust a clustering as a first step in exploratory data analysis, we must trust the labels assigned to individual data. Without supervision, how can we validate this assignment? In this paper, we present a method to attach affinity scores to the implicit labels of individual points in a clustering. The affinity scores capture the confidence level of the cluster that claims to "own" the point. This method is very general: it can be used with clusterings derived from Euclidean data, kernelized data, or even data derived from information spaces. It smoothly incorporates importance functions on clusters, allowing us to eight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Clustering Algorithms Research · Data Visualization and Analytics
