Subjectively Interesting Subgroup Discovery on Real-valued Targets

Jefrey Lijffijt; Bo Kang; Wouter Duivesteijn; Kai Puolam\"aki; Emilia; Oikarinen; Tijl De Bie

arXiv:1710.04521·stat.ML·November 8, 2021

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Jefrey Lijffijt, Bo Kang, Wouter Duivesteijn, Kai Puolam\"aki, Emilia, Oikarinen, Tijl De Bie

PDF

TL;DR

This paper presents a novel method for discovering subjectively interesting subgroups in high-dimensional data with real-valued targets, leveraging information theory and prior knowledge to efficiently identify informative, non-redundant patterns.

Contribution

It introduces a new approach that combines subjective interestingness with information theory for subgroup discovery involving real-valued attributes, supporting iterative data mining.

Findings

01

Effective identification of informative subgroups in real-valued data

02

Supports incorporation of prior knowledge for more relevant pattern discovery

03

Enables iterative exploration of high-dimensional datasets

Abstract

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.