Towards Automatic Clustering Analysis using Traces of Information Gain: The InfoGuide Method
Paulo Rocha, Diego Pinheiro, Martin Cadeiras, Carmelo Bastos-Filho

TL;DR
This paper introduces the InfoGuide method, which uses traces of information gain to enable more automatic and effective clustering analysis, especially suited for complex real-world datasets.
Contribution
The paper proposes a novel approach, InfoGuide, that leverages information gain traces for automatic clustering analysis, improving retrieval in complex datasets.
Findings
InfoGuide outperforms traditional internal metrics in real-world datasets.
It effectively captures information gain traces using Kolmogorov-Smirnov statistic.
Results show improved clustering retrieval in benchmarks and real-world data.
Abstract
Clustering analysis has become a ubiquitous information retrieval tool in a wide range of domains, but a more automatic framework is still lacking. Though internal metrics are the key players towards a successful retrieval of clusters, their effectiveness on real-world datasets remains not fully understood, mainly because of their unrealistic assumptions underlying datasets. We hypothesized that capturing {\it traces of information gain} between increasingly complex clustering retrievals---{\it InfoGuide}---enables an automatic clustering analysis with improved clustering retrievals. We validated the {\it InfoGuide} hypothesis by capturing the traces of information gain using the Kolmogorov-Smirnov statistic and comparing the clusters retrieved by {\it InfoGuide} against those retrieved by other commonly used internal metrics in artificially-generated, benchmarks, and real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Bayesian Methods and Mixture Models
