KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow
Zhiyao Li (1), Mingyu Gao (1) ((1) Tsinghua University)

TL;DR
KAPLA introduces a pragmatic, tensor-centric dataflow representation and a fast solver for scalable neural network accelerators, enabling efficient exploration of optimized dataflow schemes with minimal energy overhead.
Contribution
This work presents a comprehensive dataflow representation and a novel fast solver, KAPLA, for scalable NN accelerators, significantly improving design exploration speed and solution quality.
Findings
KAPLA achieves within 2.2% and 7.7% energy overheads for training and inference dataflows.
KAPLA outperforms random and machine-learning-based approaches in optimization quality.
KAPLA provides orders of magnitude faster search speed compared to exhaustive methods.
Abstract
Dataflow scheduling decisions are of vital importance to neural network (NN) accelerators. Recent scalable NN accelerators support a rich set of advanced dataflow techniques. The problems of comprehensively representing and quickly finding optimized dataflow schemes thus become significantly more complicated and challenging. In this work, we first propose comprehensive and pragmatic dataflow representations for temporal and spatial scheduling on scalable multi-node NN architectures. An informal hierarchical taxonomy highlights the tight coupling across different levels of the dataflow space as the major difficulty for fast design exploration. A set of formal tensor-centric directives accurately express various inter-layer and intra-layer schemes, and allow for quickly determining their validity and efficiency. We then build a generic, optimized, and fast dataflow solver, KAPLA, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Tensor decomposition and applications
