Density-based interpretable hypercube region partitioning for mixed numeric and categorical data
Samuel Ackerman, Eitan Farchi, Orna Raz, Marcel Zalmanovici, Maya, Zohar

TL;DR
This paper introduces a density-based method for partitioning mixed-type feature spaces into interpretable hyper-rectangular regions, aiding understanding of data concentration, sparsity, and model reliability.
Contribution
It presents a novel approach that handles mixed numeric and categorical features and identifies empty regions, enhancing interpretability and applicability in various data analysis tasks.
Findings
Partitions align with human spatial groupings
Method effectively identifies sparse and empty regions
Applications include error analysis and causal inference
Abstract
Consider a structured dataset of features, such as . A user may want to know where in the feature space observations are concentrated, and where it is sparse or empty. The existence of large sparse or empty regions can provide domain knowledge of soft or hard feature constraints (e.g., what is the typical income range, or that it may be unlikely to have a high income with few years of work experience). Also, these can suggest to the user that machine learning (ML) model predictions for data inputs in sparse or empty regions may be unreliable. An interpretable region is a hyper-rectangle, such as , containing all observations satisfying the constraints; typically, such regions are defined by a small number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification
