PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering
Loup-Noe Levy, Guillaume Guerard, Sonia Djebali, Soufian Ben Amor

TL;DR
PretopoMD introduces a pretopology-based hierarchical clustering algorithm for mixed data that avoids dimensionality reduction, using logical rules to improve interpretability and robustness in cluster formation.
Contribution
It presents a novel pretopology approach employing logical rules for hierarchical clustering of mixed data, bypassing traditional dimensionality reduction techniques.
Findings
Demonstrates superior clustering performance on heterogeneous datasets
Provides interpretable clusters directly from raw data
Shows robustness and improved explainability in clustering results
Abstract
This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive Normal Form, our approach formulates customizable logical rules and adjustable hyperparameters that allow for user-defined hierarchical cluster construction and facilitate tailored solutions for heterogeneous datasets. Through hierarchical dendrogram analysis and comparative clustering metrics, our method demonstrates superior performance by accurately and interpretably delineating clusters directly from raw data, thus preserving data integrity. Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability. The novelty of this work lies in its departure from traditional dimensionality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Machine Learning and Data Classification
