STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing
Vincenzo Gulisano, Hannaneh Najdataei, Yiannis Nikolakopoulos,, Alessandro V. Papadopoulos, Marina Papatriantafilou, and Philippas Tsigas

TL;DR
STRETCH is a framework that enhances stream processing scalability by combining shared memory with shared-nothing principles, enabling fast elastic reconfigurations and improved performance over existing systems.
Contribution
It introduces the concept of Virtual Shared-Nothing (VSN) parallelism and elasticity, extending semantics of stream processing tasks with formal correctness proofs.
Findings
STRETCH outperforms Apache Flink in throughput.
It achieves reconfiguration in less than 40 ms.
Provides formal semantics and correctness proofs for VSN parallelism.
Abstract
Stream processing applications extract value from raw data through Directed Acyclic Graphs of data analysis tasks. Shared-nothing (SN) parallelism is the de-facto standard to scale stream processing applications. Given an application, SN parallelism instantiates several copies of each analysis task, making each instance responsible for a dedicated portion of the overall analysis, and relies on dedicated queues to exchange data among connected instances. On the one hand, SN parallelism can scale the execution of applications both up and out since threads can run task instances within and across processes/nodes. On the other hand, its lack of sharing can cause unnecessary overheads and hinder the scaling up when threads operate on data that could be jointly accessed in shared memory. This trade-off motivated us in studying a way for stream processing applications to leverage shared memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Distributed systems and fault tolerance
