Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs

Abhinav Jangda; Arjun Guha

arXiv:1909.07190·cs.PL·September 9, 2020

Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs

Abhinav Jangda, Arjun Guha

PDF

TL;DR

This paper introduces a novel GPU execution approach for image processing pipelines that fuses loops, employs hybrid tiling, and automates loop fusion, resulting in significantly faster code than existing methods.

Contribution

It presents a new warp-sized overlapped tiling and hybrid tiling technique, along with an automatic loop fusion algorithm, improving GPU performance for image processing.

Findings

01

Achieves 1.65x speedup over Halide on GTX 1080Ti

02

Achieves 1.33x speedup over Halide on Tesla V100

03

Reduces shared memory usage and synchronization overhead

Abstract

Domain-specific languages that execute image processing pipelineson GPUs, such as Halide and Forma, operate by 1) dividing the image into overlapped tiles, and 2) fusing loops to improve memory locality. However, current approaches have limitations: 1) they require intra thread block synchronization, which has a non-trivial cost, 2) they must choose between small tiles that require more overlapped computations or large tiles that increase shared memory access (and lowers occupancy), and 3) their autoscheduling algorithms use simplified GPU models that can result in inefficient global memory accesses. We present a new approach for executing image processing pipelines on GPUs that addresses these limitations as follows. 1) We fuse loops to form overlapped tiles that fit in a single warp, which allows us to use lightweight warp synchronization. 2) We introduce hybrid tiling, which stores…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.