# Maximum Entropy Based Significance of Itemsets

**Authors:** Nikolaj Tatti

arXiv: 1904.10632 · 2019-04-30

## TL;DR

This paper introduces a maximum entropy-based method to assess the significance of itemsets by comparing observed frequencies with estimated frequencies from sub-itemsets, allowing for richer models than previous independence-based approaches.

## Contribution

It proposes a novel significance measure for itemsets using maximum entropy estimation and Kullback-Leibler divergence, improving upon previous independence-based methods.

## Key findings

- Significance measure approaches zero for derivable itemsets.
- Flexible models outperform independence assumptions on real datasets.
- The method enables statistical testing of itemset significance.

## Abstract

We consider the problem of defining the significance of an itemset. We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets. In other words, we estimate the frequency of the itemset from the frequencies of its sub-itemsets and compute the deviation between the real value and the estimate. For the estimation we use Maximum Entropy and for measuring the deviation we use Kullback-Leibler divergence.   A major advantage compared to the previous methods is that we are able to use richer models whereas the previous approaches only measure the deviation from the independence model.   We show that our measure of significance goes to zero for derivable itemsets and that we can use the rank as a statistical test. Our empirical results demonstrate that for our real datasets the independence assumption is too strong but applying more flexible models leads to good results.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.10632/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/1904.10632/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.10632/full.md

---
Source: https://tomesphere.com/paper/1904.10632