Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams
Matthew Andres Moreno, Luis Zaman, Emily Dolson

TL;DR
This paper introduces the DStream algorithms for fast, memory-efficient selection of fixed-size, rolling data samples from high-volume streams, with proven coverage guarantees and practical, resource-constrained implementation.
Contribution
It proposes a novel, low-overhead framework for streaming data curation that guarantees coverage criteria with minimal memory and computational resources.
Findings
Proven worst-case bounds on coverage quality for each algorithm.
Achieves O(1) data ingestion using primitive bit-level operations.
Provides open-source implementations for practical deployment.
Abstract
Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient -- particularly in resource-constrained or real-time contexts. Here, we address the problem of extracting a fixed-capacity, rolling subsample from a data stream. Specifically, we explore ``data stream curation'' strategies to fulfill requirements on the composition of sample time points retained. Our ``DStream'' suite of algorithms targets three temporal coverage criteria: (1) steady coverage, where retained samples should spread evenly across elapsed data stream history; (2) stretched coverage, where early data items should be proportionally favored; and (3) tilted coverage, where recent data items should be proportionally favored. For each algorithm, we prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Image and Video Quality Assessment · Peer-to-Peer Network Technologies
