Colocating Real-time Storage and Processing: An Analysis of Pull-based versus Push-based Streaming
Ovidiu-Cristian Marcu, Pascal Bouvry

TL;DR
This paper analyzes and compares push-based and pull-based streaming architectures, proposing a unified approach with shared memory to improve latency and throughput in real-time data processing systems.
Contribution
It introduces a novel push-based streaming source using shared memory, and provides an experimental analysis of push versus pull approaches in streaming architectures.
Findings
Push-based sources reduce latency compared to pull-based.
Shared memory approach increases throughput and ingestion capacity.
Experimental results favor push-based design in certain scenarios.
Abstract
Real-time Big Data architectures evolved into specialized layers for handling data streams' ingestion, storage, and processing over the past decade. Layered streaming architectures integrate pull-based read and push-based write RPC mechanisms implemented by stream ingestion/storage systems. In addition, stream processing engines expose source/sink interfaces, allowing them to decouple these systems easily. However, open-source streaming engines leverage workflow sources implemented through a pull-based approach, continuously issuing read RPCs towards the stream ingestion/storage, effectively competing with write RPCs. This paper proposes a unified streaming architecture that leverages push-based and/or pull-based source implementations for integrating ingestion/storage and processing engines that can reduce processing latency and increase system read and write throughput while making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
