Streaming Computations with Region-Based State on SIMD Architectures
Stephen Timcheck, Jeremy Buhler

TL;DR
This paper presents a method for efficiently executing stateful streaming computations on SIMD architectures like GPUs by dividing streams into regions with shared state, improving parallelism and performance.
Contribution
It introduces a region-based state abstraction and a low-level protocol for control signals, enabling better parallel execution of irregular streaming computations on GPUs.
Findings
Region boundary frequency affects SIMD occupancy.
The MERCATOR system effectively implements region-based streaming.
Parallelism is improved for stateful streaming computations.
Abstract
Streaming computations on massive data sets are an attractive candidate for parallelization, particularly when they exhibit independence (and hence data parallelism) between items in the stream. However, some streaming computations are stateful, which disrupts independence and can limit parallelism. In this work, we consider how to extract data parallelism from streaming computations with a common, limited form of statefulness. The stream is assumed to be divided into variably-sized regions, and items in the same region are processed in a common context of state. In general, the computation to be performed on a stream is also irregular, with each item potentially undergoing different, data-dependent processing. This work describes mechanisms to implement such computations efficiently on a SIMD-parallel architecture such as a GPU. We first develop a low-level protocol by which a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems
