Differential Subgroup Discovery: Characterizing Where Two Populations Differ, and Why
Sascha Xu, Jilles Vreeken

TL;DR
This paper introduces a method to identify and interpret specific subgroups within data where two populations differ significantly, aiding understanding of population gaps and their causes.
Contribution
It formalizes the concept of differential subgroups, proposes an optimization approach, and presents DiffSub, a gradient-based method for discovering interpretable subgroups.
Findings
DiffSub effectively identifies informative subgroups in various datasets.
The method reveals where and why population differences occur.
It provides a causal interpretation under certain conditions.
Abstract
We study the problem of understanding where two populations differ within a feature space, which we formalize in the concept of a differential subgroup: a subset of individuals from both populations who, despite sharing similar characteristics, exhibit exceptional differences in a target outcome. Differential subgroups reveal the regions of the feature space where population-level gaps are most pronounced and can help practitioners identify the covariate combinations that are structurally responsible for these differences, e.g.~in clinical analysis, model diagnostics, or treatment-effect studies. We introduce a general optimization objective for discovering differential subgroups and establish conditions under which the resulting subgroups admit a causal interpretation of population differences. We propose DiffSub, a gradient-based approach that discovers interpretable differential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
