Efficient Automatic Scheduling of Imaging and Vision Pipelines for the   GPU

Luke Anderson; Andrew Adams; Karima Ma; Tzu-Mao Li; Tian Jin; Jonathan; Ragan-Kelley

arXiv:2012.07145·cs.PL·August 29, 2023

Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU

Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, Jonathan, Ragan-Kelley

PDF

Open Access

TL;DR

This paper introduces an automatic GPU scheduling algorithm for imaging and vision pipelines that significantly speeds up compilation and produces high-performance code comparable to expert-tuned solutions.

Contribution

A novel scalable search-based algorithm that automates high-performance GPU code generation from high-level descriptions without manual tuning.

Findings

01

Average compile time speedup of 49x

02

Schedules are 1.7x faster than existing automatic methods

03

Performance comparable to expert human tuning

Abstract

We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. We address the scalability challenge of extending search-based automatic scheduling to map large real-world programs to the deep hierarchies of memory and parallelism on GPU architectures in reasonable compile time. We achieve this using (1) a two-phase search algorithm that first 'freezes' decisions for the lowest cost sections of a program, allowing relatively more time to be spent on the important stages, (2) a hierarchical sampling strategy that groups schedules based on their structural similarity, then samples representatives to be evaluated, allowing us to explore a large space with few samples, and (3) memoization of repeated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Distributed and Parallel Computing Systems