Kitsune: Enabling Dataflow Execution on GPUs
Michael Davies, Neal Crago, Karthikeyan Sankaralingam, Stephen W., Keckler

TL;DR
Kitsune introduces GPU architectural adjustments and a compiler to enable efficient dataflow execution, improving performance and reducing off-chip traffic for deep learning workloads without redesigning the hardware.
Contribution
The paper presents Kitsune, a novel set of primitives and a compiler that facilitate dataflow execution on GPUs, addressing limitations of traditional bulk-synchronous models.
Findings
Achieves 1.3×-2.3× performance improvement on challenge applications.
Reduces off-chip traffic by up to 98% during inference.
Provides 1.1×-2.4× performance gains during training.
Abstract
State of art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a bulk-synchronous execution model which has many drawbacks and is ill-suited for the graph structure of DL applications. Many industry and academic works attempt to overcome these by employing vertical fusion but this approach still fails to realize three untapped opportunities: (1) the fact that many resources on the GPU are idle while only one operator executes due to temporal multiplexing of the SM; (2) lower energy from more intelligent on-chip data-movement which lends to higher performance in a power-provisioned environment. (3) inability to exploit hidden or reduction dimensions as a source of parallelism to ease pressure on batch size. This paper explores relatively uncharted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
