Scalable Sampling for High Utility Patterns

Lamine Diop; Marc Plantevit

arXiv:2410.22964·cs.DB·October 31, 2024

Scalable Sampling for High Utility Patterns

Lamine Diop, Marc Plantevit

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable high utility pattern sampling algorithm for large quantitative databases, enabling fast, representative pattern discovery with strong statistical guarantees, outperforming existing methods.

Contribution

The paper proposes a novel sampling algorithm based on two original theorems, improving scalability and efficiency in discovering high utility patterns in large datasets.

Findings

01

Outperforms state-of-the-art methods in experiments

02

Enables instant discovery of relevant patterns

03

Ensures statistical guarantees through sampling

Abstract

Discovering valuable insights from data through meaningful associations is a crucial task. However, it becomes challenging when trying to identify representative patterns in quantitative databases, especially with large datasets, as enumeration-based strategies struggle due to the vast search space involved. To tackle this challenge, output space sampling methods have emerged as a promising solution thanks to its ability to discover valuable patterns with reduced computational overhead. However, existing sampling methods often encounter limitations when dealing with large quantitative database, resulting in scalability-related challenges. In this work, we propose a novel high utility pattern sampling algorithm and its on-disk version both designed for large quantitative databases based on two original theorems. Our approach ensures both the interactivity required for user-centered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ScalableSamplingInLargeDatabases/QPlus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications