Toward Reliable and Rapid Elasticity for Streaming Dataflows on Clouds
Anshu Shukla, Yogesh Simmhan

TL;DR
This paper introduces new checkpoint and migration techniques for streaming dataflows on cloud platforms, enabling rapid, lossless migration and improved stability in elastic streaming applications.
Contribution
It presents novel dataflow migration strategies that enable fast, lossless task migration in streaming platforms like Apache Storm, improving elasticity and recovery times.
Findings
Migration time reduced from over 100 sec to 50 sec
No message loss or re-processing during migration
Applications stabilize much earlier after migration
Abstract
The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using scale-in and -out on elastic Cloud resources. Platforms like Apache Storm do not provide robust capabilities for responding to such dynamism and for rapid task migration across VMs. We propose several dataflow checkpoint and migration approaches that allow a running streaming dataflow to migrate, without any loss of in-flight messages or their internal tasks states, while reducing the time to recover and stabilize. We implement and evaluate these migration strategies on Apache Storm using micro and application dataflows for scaling in and out on up to 2-21 Azure VMs. Our results show that we can migrate dataflows of large sizes within 50 sec, in comparison…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
