Density-based interpretable hypercube region partitioning for mixed   numeric and categorical data

Samuel Ackerman; Eitan Farchi; Orna Raz; Marcel Zalmanovici; Maya; Zohar

arXiv:2110.05430·cs.LG·November 9, 2021

Density-based interpretable hypercube region partitioning for mixed numeric and categorical data

Samuel Ackerman, Eitan Farchi, Orna Raz, Marcel Zalmanovici, Maya, Zohar

PDF

Open Access

TL;DR

This paper introduces a density-based method for partitioning mixed-type feature spaces into interpretable hyper-rectangular regions, aiding understanding of data concentration, sparsity, and model reliability.

Contribution

It presents a novel approach that handles mixed numeric and categorical features and identifies empty regions, enhancing interpretability and applicability in various data analysis tasks.

Findings

01

Partitions align with human spatial groupings

02

Method effectively identifies sparse and empty regions

03

Applications include error analysis and causal inference

Abstract

Consider a structured dataset of features, such as ${SEX, INCOME, RACE, EXPERIENCE}$ . A user may want to know where in the feature space observations are concentrated, and where it is sparse or empty. The existence of large sparse or empty regions can provide domain knowledge of soft or hard feature constraints (e.g., what is the typical income range, or that it may be unlikely to have a high income with few years of work experience). Also, these can suggest to the user that machine learning (ML) model predictions for data inputs in sparse or empty regions may be unreliable. An interpretable region is a hyper-rectangle, such as ${RACE \in {Black, White}} &$ ${10 \leq EXPERIENCE \leq 13}$ , containing all observations satisfying the constraints; typically, such regions are defined by a small number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification