Collaborative Reuse of Streaming Dataflows in IoT Applications
Shilpa Chaturvedi, Sahil Tyagi, Yogesh Simmhan

TL;DR
This paper introduces algorithms for reusing overlapping dataflow components in distributed stream processing systems to enhance resource efficiency in IoT applications, validated through extensive experiments.
Contribution
It proposes novel dataflow reuse algorithms for Apache Storm that identify and merge overlapping tasks, improving resource utilization in IoT data processing.
Findings
Significant resource savings demonstrated in experiments
Effective identification of reusable dataflow components
Validated with both synthetic and real IoT dataflows
Abstract
Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose \emph{dataflow reuse algorithms} that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a \emph{merged dataflow}. Similar algorithms to unmerge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
