# Identifying Consistent Statements about Numerical Data with   Dispersion-Corrected Subgroup Discovery

**Authors:** Mario Boley, Bryan R. Goldsmith, Luca M. Ghiringhelli, Jilles, Vreeken

arXiv: 1701.07696 · 2017-07-06

## TL;DR

This paper introduces a new efficient method for subgroup discovery in numerical data that optimizes dispersion measures, leading to more reliable and consistent data statements, especially useful in scientific research.

## Contribution

It extends the optimistic estimator framework to optimize dispersion measures, enabling linear-time algorithms for certain functions and improving subgroup discovery accuracy.

## Key findings

- Efficient algorithms for dispersion-aware subgroup discovery.
- Discovery of subgroups with significantly smaller errors.
- Linear-time algorithm for average absolute deviation from median.

## Abstract

Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.07696/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1701.07696/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1701.07696/full.md

---
Source: https://tomesphere.com/paper/1701.07696