StreamSampling.jl: Efficient Sampling from Data Streams in Julia
Adriano Meligrana

TL;DR
StreamSampling.jl is a Julia library that enables efficient, single-pass sampling from data streams with unknown sizes, offering memory efficiency and performance advantages over traditional methods.
Contribution
The paper introduces StreamSampling.jl, a Julia library that provides efficient, general streaming sampling methods with empirical benchmarks demonstrating its advantages.
Findings
Demonstrates performance improvements over traditional sampling methods.
Shows reduced memory footprint during streaming sampling.
Provides empirical benchmarks validating efficiency.
Abstract
StreamSamplingjl is a Julia library designed to provide general and efficient methods for sampling from data streams in a single pass, even when the total number of items is unknown. In this paper, we describe the capabilities of the library and its advantages over traditional sampling procedures, such as maintaining a small, constant memory footprint and avoiding the need to fully materialize the stream in memory. Furthermore, we provide empirical benchmarks comparing online sampling methods against standard approaches, demonstrating performance and memory improvements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
