TL;DR
This paper introduces a weighted reservoir sampling method for efficient, scalable pattern sampling from streaming data, enabling accurate online classifiers for complex sequential and weighted itemsets.
Contribution
It presents a generic, scalable reservoir sampling algorithm capable of handling various pattern types and temporal biases in streaming data.
Findings
Effective pattern sampling from complex data streams.
Enables online classifiers with accuracy comparable to offline models.
Progress in incremental online sequential itemset classification.
Abstract
Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing complex data streams like sequential and weighted itemsets. While reservoir sampling serves as a fundamental method for randomly selecting fixed-size samples from data streams, its application to such complex patterns remains largely unexplored. In this study, we introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data, thus ensuring scalability and efficiency. We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets. Through comprehensive experiments conducted on real-world datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
