Efficiently Sampling Interval Patterns from Numerical Databases
Djawad Bekkoucha, Lamine Diop, Abdelkader Ouali, Bruno Cr\'emilleux, Patrice Boizumault

TL;DR
This paper introduces Fips and HFips, novel sampling methods for interval patterns in numerical databases, enabling efficient, proportionate sampling based on frequency and hyper-volume, and addressing the long-tail issue.
Contribution
It presents the first sampling approach for interval patterns in numerical data, extending it with hyper-volume consideration, and provides formal proofs and experimental validation.
Findings
Fips samples patterns proportionally to frequency.
HFips samples patterns proportionally to frequency and hyper-volume.
Methods effectively address the long-tail phenomenon.
Abstract
Pattern sampling has emerged as a promising approach for information discovery in large databases, allowing analysts to focus on a manageable subset of patterns. In this approach, patterns are randomly drawn based on an interestingness measure, such as frequency or hyper-volume. This paper presents the first sampling approach designed to handle interval patterns in numerical databases. This approach, named Fips, samples interval patterns proportionally to their frequency. It uses a multi-step sampling procedure and addresses a key challenge in numerical data: accurately determining the number of interval patterns that cover each object. We extend this work with HFips, which samples interval patterns proportionally to both their frequency and hyper-volume. These methods efficiently tackle the well-known long-tail phenomenon in pattern sampling. We formally prove that Fips and HFips…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Time Series Analysis and Forecasting · Advanced Database Systems and Queries
