Deep Clustering Evaluation: How to Validate Internal Clustering   Validation Measures

Zeya Wang; Chenglong Ye

arXiv:2403.14830·stat.ML·March 25, 2024·1 cites

Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Zeya Wang, Chenglong Ye

PDF

Open Access

TL;DR

This paper critically examines the challenges of evaluating deep clustering methods, proposing a theoretical framework and systematic approach to improve the reliability of internal validation measures in high-dimensional deep learning contexts.

Contribution

It introduces a theoretical framework and systematic methodology for applying clustering validation measures effectively in deep clustering, addressing issues caused by data embedding and model variability.

Findings

01

The proposed framework aligns better with external validation measures.

02

It reduces the misguidance caused by improper use of validation indices.

03

Experiments confirm improved evaluation consistency in deep clustering.

Abstract

Deep clustering, a method for partitioning complex, high-dimensional data using deep neural networks, presents unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering, which involves projecting data into lower-dimensional embeddings before partitioning. Two key issues are identified: 1) the curse of dimensionality when applying these measures to raw data, and 2) the unreliable comparison of clustering results across different embedding spaces stemming from variations in training procedures and parameter settings in different clustering models. This paper addresses these challenges in evaluating clustering quality in deep learning. We present a theoretical framework to highlight ineffectiveness arising from using internal validation measures on raw and embedded data and propose a systematic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research