Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data

Lehao Li; Qiang Huang; Yihao Ang; Bryan Kian Hsiang Low; Anthony K. H. Tung; Xiaokui Xiao

arXiv:2604.05857·cs.LG·April 8, 2026

Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data

Lehao Li, Qiang Huang, Yihao Ang, Bryan Kian Hsiang Low, Anthony K. H. Tung, Xiaokui Xiao

PDF

TL;DR

WISE is a novel unsupervised framework for clustering mixed-type tabular data that unifies representation, feature weighting, clustering, and interpretation, providing high-quality, interpretable results.

Contribution

The paper introduces WISE, a comprehensive unsupervised clustering method with a new encoding, feature weighting, and explanation approach for mixed-type data.

Findings

01

WISE outperforms classical and neural baselines in clustering quality.

02

WISE provides faithful, human-interpretable explanations.

03

The method is efficient on real-world datasets.

Abstract

Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent feature relevance, and disconnected and post-hoc explanation from the clustering process. We propose WISE, a Weight-Informed Self-Explaining framework that unifies representation, feature weighting, clustering, and interpretation in a fully unsupervised and transparent pipeline. WISE introduces Binary Encoding with Padding (BEP) to align heterogeneous features in a unified sparse space, a Leave-One-Feature-Out (LOFO) strategy to sense multiple high-quality and diverse feature-weighting views, and a two-stage weight-aware clustering procedure to aggregate alternative semantic partitions. To ensure intrinsic interpretability, we further develop Discriminative FreqItems (DFI), which yields feature-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.