Efficient Time-Evolving Stream Processing at Scale

Yu Huang

arXiv:1806.00760·cs.DC·June 5, 2018

Efficient Time-Evolving Stream Processing at Scale

Yu Huang

PDF

Open Access

TL;DR

This paper introduces FISH, a novel approach for efficient, scalable, and low-memory stream processing of time-evolving datasets with hot keys, significantly improving latency and memory usage in distributed systems.

Contribution

FISH provides a new epoch-based hot key identification method with low memory overhead and cost-efficient worker assignment, enhancing load balancing in time-evolving stream processing.

Findings

01

FISH reduces latency by up to 87% compared to state-of-the-art methods.

02

FISH achieves over 99.96% memory overhead reduction.

03

FISH outperforms existing solutions in real-world and synthetic datasets.

Abstract

Time-evolving stream datasets exist ubiquitously in many real-world applications where their inherent hot keys often evolve over times. Nevertheless, few existing solutions can provide efficient load balance on these time-evolving datasets while preserving low memory overhead. In this paper, we present a novel grouping approach (named FISH), which can provide the efficient time-evolving stream processing at scale. The key insight of this work is that the keys of time-evolving stream data can have a skewed distribution within any bounded distance of time interval. This enables to accurately identify the recent hot keys for the real-time load balance within a bounded scope. We therefore propose an epoch-based recent hot key identification with specialized intra-epoch frequency counting (for maintaining low memory overhead) and inter-epoch hotness decaying (for suppressing superfluous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Peer-to-Peer Network Technologies · Caching and Content Delivery