Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
Arne Symons, Linyan Mei, Steven Colleman, Pouya Houshmand, Sebastian Karl, Marian Verhelst

TL;DR
This paper introduces Stream, a framework for exploring layer fusion in heterogeneous dataflow accelerators, significantly improving inference efficiency by reducing energy and latency through optimized architecture and mapping strategies.
Contribution
It presents a novel design space exploration framework that enables holistic optimization of layer fusion on heterogeneous dataflow accelerators, validated across multiple hardware platforms.
Findings
Up to 2.2x lower energy-delay product in inference efficiency.
Effective exploration of architecture and mapping strategies.
Validated with three state-of-the-art hardware implementations.
Abstract
As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far, these systems exploit the increased parallelism by coarsely mapping a single layer at a time across cores, which incurs frequent costly off-chip memory accesses, or by pipelining batches of inputs, which falls short in meeting the demands of latency-critical applications. To alleviate these bottlenecks, this work explores a new fine-grain mapping paradigm, referred to as layer fusion, on heterogeneous dataflow accelerators through a novel design space exploration framework called Stream. Stream captures a wide variety of heterogeneous dataflow architectures and mapping granularities, and implements a memory and communication-aware latency and energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing
