High-Performance Code Generation though Fusion and Vectorization
Jason Sewall, Simon J. Pennycook

TL;DR
This paper introduces HFAV, a technique that automatically fuses and vectorizes nested loop kernels to reduce storage and enhance performance on modern hardware, using a declarative transformation approach.
Contribution
It presents a novel method for automatic kernel transformation involving fusion and vectorization, with a prototype implementation that improves HPC code performance.
Findings
Reduced intermediate storage in transformed kernels
Improved performance on contemporary hardware
Effective automatic transformation for nested loops
Abstract
We present a technique for automatically transforming kernel-based computations in disparate, nested loops into a fused, vectorized form that can reduce intermediate storage needs and lead to improved performance on contemporary hardware. We introduce representations for the abstract relationships and data dependencies of kernels in loop nests and algorithms for manipulating them into more efficient form; we similarly introduce techniques for determining data access patterns for stencil-like array accesses and show how this can be used to elide storage and improve vectorization. We discuss our prototype implementation of these ideas---named HFAV---and its use of a declarative, inference-based front-end to drive transformations, and we present results for some prominent codes in HPC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Embedded Systems Design Techniques
