Efficient Discovery of Significant Patterns with Few-Shot Resampling
Leonardo Pellegrina, Fabio Vandin

TL;DR
This paper introduces FSR, an efficient algorithm for discovering statistically significant patterns in data, using few resampled datasets to ensure rigorous false discovery guarantees across various pattern types.
Contribution
FSR provides a novel, efficient framework for significant pattern mining that requires only a small number of resamples, applicable to itemsets, sequences, and subgroups, with theoretical guarantees.
Findings
FSR effectively discovers significant subgroups in real datasets.
Requires fewer resampled datasets than existing methods.
Provides rigorous false discovery control.
Abstract
Significant pattern mining is a fundamental task in mining transactional data, requiring to identify patterns significantly associated with the value of a given feature, the target. In several applications, such as biomedicine, basket market analysis, and social networks, the goal is to discover patterns whose association with the target is defined with respect to an underlying population, or process, of which the dataset represents only a collection of observations, or samples. A natural way to capture the association of a pattern with the target is to consider its statistical significance, assessing its deviation from the (null) hypothesis of independence between the pattern and the target. While several algorithms have been proposed to find statistically significant patterns, it remains a computationally demanding task, and for complex patterns such as subgroups, no efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Advanced Database Systems and Queries
