Efficient Time-Evolving Stream Processing at Scale
Yu Huang

TL;DR
This paper introduces FISH, a novel approach for efficient, scalable, and low-memory stream processing of time-evolving datasets with hot keys, significantly improving latency and memory usage in distributed systems.
Contribution
FISH provides a new epoch-based hot key identification method with low memory overhead and cost-efficient worker assignment, enhancing load balancing in time-evolving stream processing.
Findings
FISH reduces latency by up to 87% compared to state-of-the-art methods.
FISH achieves over 99.96% memory overhead reduction.
FISH outperforms existing solutions in real-world and synthetic datasets.
Abstract
Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these time-evolving datasets while preserving low memory overhead. In this paper, we present a novel grouping approach (named FISH), which can provide the efficient time-evolving stream processing at scale. The key insight of this work is that the keys of time-evolving stream data can have a skewed distribution within any bounded distance of time interval. This enables to accurately identify the recent hot keys for the real-time load balance within a bounded scope. We therefore propose an epoch-based recent hot key identification with specialized intra-epoch frequency counting (for maintaining low memory overhead) and inter-epoch hotness decaying (for suppressing superfluous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Peer-to-Peer Network Technologies · Caching and Content Delivery
