Efficient Discovery of Significant Patterns with Few-Shot Resampling

Leonardo Pellegrina; Fabio Vandin

arXiv:2406.11803·cs.LG·June 18, 2024

Efficient Discovery of Significant Patterns with Few-Shot Resampling

Leonardo Pellegrina, Fabio Vandin

PDF

Open Access 1 Repo

TL;DR

This paper introduces FSR, an efficient algorithm for discovering statistically significant patterns in data, using few resampled datasets to ensure rigorous false discovery guarantees across various pattern types.

Contribution

FSR provides a novel, efficient framework for significant pattern mining that requires only a small number of resamples, applicable to itemsets, sequences, and subgroups, with theoretical guarantees.

Findings

01

FSR effectively discovers significant subgroups in real datasets.

02

Requires fewer resampled datasets than existing methods.

03

Provides rigorous false discovery control.

Abstract

Significant pattern mining is a fundamental task in mining transactional data, requiring to identify patterns significantly associated with the value of a given feature, the target. In several applications, such as biomedicine, basket market analysis, and social networks, the goal is to discover patterns whose association with the target is defined with respect to an underlying population, or process, of which the dataset represents only a collection of observations, or samples. A natural way to capture the association of a pattern with the target is to consider its statistical significance, assessing its deviation from the (null) hypothesis of independence between the pattern and the target. While several algorithms have been proposed to find statistically significant patterns, it remains a computationally demanding task, and for complex patterns such as subgroups, no efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vandinlab/fsr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Advanced Database Systems and Queries