Streaming Tensor Programs: A Streaming Abstraction for Dynamic Parallelism
Gina Sohn, Genghan Zhang, Konstantin Hossfeld, Jungwoo Kim, Nathan Sobotka, Nathan Zhang, Olivia Hsu, Kunle Olukotun

TL;DR
Streaming Tensor Programs (STeP) introduce a flexible abstraction for dynamic tensor workloads on spatial dataflow accelerators, enabling optimizations like dynamic tiling and parallelization that significantly improve efficiency and utilization.
Contribution
STeP provides a novel streaming abstraction with routing, memory hierarchy, and symbolic-shape semantics, allowing efficient execution of dynamic tensor workloads on SDAs.
Findings
Dynamic tiling surpasses prior Pareto-optimal bounds.
Dynamic parallelization reduces latency by approximately 2.72x.
Configuration time-multiplexing boosts compute utilization by about 2.64x.
Abstract
Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators (SDAs) forces these dynamic behaviors to be implemented statically and/or unoptimized. To address these challenges, we present Streaming Tensor Programs (STeP), a streaming abstraction that enables dynamic tensor workloads to run efficiently on SDAs. STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic-shape semantics that expose dynamic data rates and tensor dimensions. These capabilities unlock new optimizations, like dynamic tiling, dynamic parallelization, and configuration time-multiplexing, that adapt SDA execution to dynamic behaviors while preserving dataflow efficiency.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
