# Itemsets for Real-valued Datasets

**Authors:** Nikolaj Tatti

arXiv: 1902.00804 · 2019-02-05

## TL;DR

This paper introduces a novel method for mining meaningful itemsets from real-valued datasets by averaging over threshold-based supports, enabling efficient discovery of statistically significant patterns.

## Contribution

It proposes a new family of quality scores for real-valued itemsets, treating thresholds as random variables and normalizing support for better pattern significance assessment.

## Key findings

- Efficient computation of average support for real-valued itemsets.
- Normalizations against independence and partition assumptions.
- Effective discovery of statistically significant patterns.

## Abstract

Pattern mining is one of the most well-studied subfields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank itemsets efficiently from binary data, there is surprisingly little research done in mining patterns from real-valued data. In this paper we propose a family of quality scores for real-valued itemsets. We approach the problem by considering casting the dataset into a binary data and computing the support from this data. This naive approach requires us to select thresholds. To remedy this, instead of selecting one set of thresholds, we treat thresholds as random variables and compute the average support. We show that we can compute this support efficiently, and we also introduce two normalisations, namely comparing the support against the independence assumption and, more generally, against the partition assumption. Our experimental evaluation demonstrates that we can discover statistically significant patterns efficiently.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.00804/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1902.00804/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1902.00804/full.md

---
Source: https://tomesphere.com/paper/1902.00804