Explainable cluster analysis: a bagging approach
Federico Maria Quetti, Elena Ballante, Silvia Figini, Paolo Giudici

TL;DR
This paper introduces an ensemble clustering method that enhances interpretability by providing feature importance scores, improving stability especially in small or noisy datasets, and offering a unified view of structure and variable relevance.
Contribution
It presents a novel bagging-based clustering framework with feature importance assessment, addressing the explainability gap in traditional clustering methods.
Findings
Improves stability and robustness of clusters in small-sample or noisy data.
Provides interpretable feature importance scores alongside clustering results.
Demonstrates effectiveness on simulated and real-world datasets.
Abstract
A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of similar observations. To address this limitation, we propose an ensemble-based clustering framework that integrates bagging and feature dropout to generate feature importance scores, in analogy with feature importance mechanisms in supervised random forests. By leveraging multiple bootstrap resampling schemes and aggregating the resulting partitions, the method improves stability and robustness of the cluster definition, particularly in small-sample or noisy settings. Feature importance is assessed through an information-theoretic approach: at each step, the mutual information between each feature and the estimated cluster labels is computed and weighted by a measure of clustering validity to emphasize well-formed partitions, before being…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Anomaly Detection Techniques and Applications
