Finding Robust Itemsets Under Subsampling

Nikolaj Tatti; Fabian Moerchen; Toon Calders

arXiv:1902.06743·cs.DB·April 25, 2019

Finding Robust Itemsets Under Subsampling

Nikolaj Tatti, Fabian Moerchen, Toon Calders

PDF

TL;DR

This paper introduces a theoretical framework for measuring the robustness of itemset properties under subsampling, enabling effective pattern reduction without relying on noise models or sampling.

Contribution

It proposes an analytical method to compute robustness of itemset properties, facilitating pattern reduction and ranking without data sampling or null hypothesis assumptions.

Findings

01

Robustness measure effectively reduces pattern explosion.

02

Analytical computation of robustness avoids data sampling.

03

Ranking by robustness identifies interesting itemsets.

Abstract

Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: if an itemset is closed, free, non-derivable or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, the robust patterns for a given property are always a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.