Scalable Sampling for High Utility Patterns
Lamine Diop, Marc Plantevit

TL;DR
This paper introduces a scalable high utility pattern sampling algorithm for large quantitative databases, enabling fast, representative pattern discovery with strong statistical guarantees, outperforming existing methods.
Contribution
The paper proposes a novel sampling algorithm based on two original theorems, improving scalability and efficiency in discovering high utility patterns in large datasets.
Findings
Outperforms state-of-the-art methods in experiments
Enables instant discovery of relevant patterns
Ensures statistical guarantees through sampling
Abstract
Discovering valuable insights from data through meaningful associations is a crucial task. However, it becomes challenging when trying to identify representative patterns in quantitative databases, especially with large datasets, as enumeration-based strategies struggle due to the vast search space involved. To tackle this challenge, output space sampling methods have emerged as a promising solution thanks to its ability to discover valuable patterns with reduced computational overhead. However, existing sampling methods often encounter limitations when dealing with large quantitative database, resulting in scalability-related challenges. In this work, we propose a novel high utility pattern sampling algorithm and its on-disk version both designed for large quantitative databases based on two original theorems. Our approach ensures both the interactivity required for user-centered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
