Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator Platforms
Steven Colleman, Arne Symons, Victor J.B. Jung, Marian Verhelst

TL;DR
This paper extends a design space exploration framework to optimize transformer network execution across diverse hardware, demonstrating that layer fusion can significantly reduce memory use and improve performance.
Contribution
It introduces support for transformer scheduling in the DSE framework Stream, enabling hardware-agnostic exploration and analysis of layer fusion benefits.
Findings
Layer fusion reduces memory requirements for transformer attention heads.
Optimal scheduling varies with input size and hardware architecture.
Adapting execution schedules improves latency and energy efficiency.
Abstract
The impact of transformer networks is booming, yet, they come with significant computational complexity. It is therefore essential to understand how to optimally map and execute these networks on modern neural processor hardware. So far, literature on transformer scheduling optimization has been focusing on deployment on GPU and specific ASICs. This work enables extensive hardware/mapping exploration by extending the DSE framework Stream towards support for transformers across a wide variety of hardware architectures and different execution schedules. After validation, we explore the optimal schedule for transformer layers/attention heads and investigate whether layer fusion is beneficial to improve latency, energy or memory requirements. Our study shows that the memory requirements for active feature data can be drastically reduced, by adapting the execution schedule based on the size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultilevel Inverters and Converters · Interconnection Networks and Systems · Silicon Carbide Semiconductor Technologies
