Discovering outstanding subgroup lists for numeric targets using MDL

Hugo M. Proen\c{c}a; Peter Gr\"unwald; Thomas B\"ack; Matthijs van; Leeuwen

arXiv:2006.09186·cs.LG·March 16, 2021

Discovering outstanding subgroup lists for numeric targets using MDL

Hugo M. Proen\c{c}a, Peter Gr\"unwald, Thomas B\"ack, Matthijs van, Leeuwen

PDF

3 Repos

TL;DR

This paper introduces a MDL-based formulation for subgroup set discovery targeting numeric data, enabling the extraction of non-redundant, interpretable subgroup lists with strong deviations and low spread.

Contribution

It formalizes a dispersion-aware MDL approach for subgroup set discovery and proposes SSD++, a heuristic algorithm that finds high-quality, non-redundant subgroup lists.

Findings

01

SSD++ produces compact, non-redundant subgroup lists

02

The method effectively balances subgroup quality and complexity

03

Empirical results show superior performance over existing methods

Abstract

The task of subgroup discovery (SD) is to find interpretable descriptions of subsets of a dataset that stand out with respect to a target attribute. To address the problem of mining large numbers of redundant subgroups, subgroup set discovery (SSD) has been proposed. State-of-the-art SSD methods have their limitations though, as they typically heavily rely on heuristics and/or user-chosen hyperparameters. We propose a dispersion-aware problem formulation for subgroup set discovery that is based on the minimum description length (MDL) principle and subgroup lists. We argue that the best subgroup list is the one that best summarizes the data given the overall distribution of the target. We restrict our focus to a single numeric target variable and show that our formalization coincides with an existing quality measure when finding a single subgroup, but that-in addition-it allows to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMinimum Description Length · Symbolic rule learning · SSD