Robust subgroup discovery

Hugo Manuel Proen\c{c}a; Peter Gr\"unwald; Thomas B\"ack; Matthijs van; Leeuwen

arXiv:2103.13686·cs.LG·October 11, 2022

Robust subgroup discovery

Hugo Manuel Proen\c{c}a, Peter Gr\"unwald, Thomas B\"ack, Matthijs van, Leeuwen

PDF

2 Repos

TL;DR

This paper introduces a novel approach to robust subgroup discovery that combines interpretability, statistical robustness, and non-redundancy, using a global model class and a greedy heuristic to find high-quality subgroup lists.

Contribution

It formalizes the problem of robust subgroup discovery with a new model class and MDL-based optimality, and proposes SSD++, a greedy algorithm that effectively balances significance and complexity.

Findings

01

SSD++ outperforms previous methods on 54 datasets

02

The method guarantees the most significant subgroup per iteration

03

Empirical results show improved quality and generalization

Abstract

We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, including traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMinimum Description Length