Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU
Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, Jonathan, Ragan-Kelley

TL;DR
This paper introduces an automatic GPU scheduling algorithm for imaging and vision pipelines that significantly speeds up compilation and produces high-performance code comparable to expert-tuned solutions.
Contribution
A novel scalable search-based algorithm that automates high-performance GPU code generation from high-level descriptions without manual tuning.
Findings
Average compile time speedup of 49x
Schedules are 1.7x faster than existing automatic methods
Performance comparable to expert human tuning
Abstract
We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. We address the scalability challenge of extending search-based automatic scheduling to map large real-world programs to the deep hierarchies of memory and parallelism on GPU architectures in reasonable compile time. We achieve this using (1) a two-phase search algorithm that first 'freezes' decisions for the lowest cost sections of a program, allowing relatively more time to be spent on the important stages, (2) a hierarchical sampling strategy that groups schedules based on their structural similarity, then samples representatives to be evaluated, allowing us to explore a large space with few samples, and (3) memoization of repeated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Distributed and Parallel Computing Systems
