# Finding Robust Itemsets Under Subsampling

**Authors:** Nikolaj Tatti, Fabian Moerchen, Toon Calders

arXiv: 1902.06743 · 2019-04-25

## TL;DR

This paper introduces a theoretical framework for measuring the robustness of itemset properties under subsampling, enabling effective pattern reduction without relying on noise models or sampling.

## Contribution

It proposes an analytical method to compute robustness of itemset properties, facilitating pattern reduction and ranking without data sampling or null hypothesis assumptions.

## Key findings

- Robustness measure effectively reduces pattern explosion.
- Analytical computation of robustness avoids data sampling.
- Ranking by robustness identifies interesting itemsets.

## Abstract

Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: if an itemset is closed, free, non-derivable or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-$k$ approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.

---
Source: https://tomesphere.com/paper/1902.06743