Interpretable Deep Clustering for Tabular Data
Jonathan Svirsky, Ofir Lindenbaum

TL;DR
This paper introduces a deep learning framework for interpretable clustering of tabular data, identifying key features for each sample and cluster, and demonstrating reliable, interpretable results across various domains.
Contribution
It presents a novel self-supervised feature selection method and a model that predicts interpretable cluster assignments with feature importance at both sample and cluster levels.
Findings
Reliable clustering in biological, text, image, and physics datasets.
Model provides interpretable feature importance for samples and clusters.
Code available for reproducibility.
Abstract
Clustering is a fundamental learning task widely used as a first step in data analysis. For example, biologists use cluster assignments to analyze genome sequences, medical records, or images. Since downstream analysis is typically performed at the cluster level, practitioners seek reliable and interpretable clustering models. We propose a new deep-learning framework for general domain tabular data that predicts interpretable cluster assignments at the instance and cluster levels. First, we present a self-supervised procedure to identify the subset of the most informative features from each data point. Then, we design a model that predicts cluster assignments and a gate matrix that provides cluster-level feature selection. Overall, our model provides cluster assignments with an indication of the driving feature for each sample and each cluster. We show that the proposed method can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Biomedical Text Mining and Ontologies · Explainable Artificial Intelligence (XAI)
