Employing polyhedral methods to optimize stencils on FPGAs with stencil-specific caches, data reuse, and wide data bursts
Florian Mayer, Julian Brandner, and Michael Philippsen

TL;DR
This paper introduces polyhedral methods to create custom stencil-specific caches on FPGAs, significantly improving stencil code performance by 43x to 156x through optimized data reuse and wide data bursts.
Contribution
It demonstrates how to generate and utilize custom cache structures on FPGAs for stencil codes using polyhedral techniques, enabling efficient data reuse and high-performance execution.
Findings
Runtime improvements of 43x to 156x for tested stencils.
Effective derivation of directives and code restructuring for FPGA compilation.
Successful application to a set of 10 stencil benchmarks.
Abstract
It is well known that to accelerate stencil codes on CPUs or GPUs and to exploit hardware caches and their lines optimizers must find spatial and temporal locality of array accesses to harvest data-reuse opportunities. On FPGAs there is the burden that there are no built-in caches (or only pre-built hardware descriptions for cache blocks that are inefficient for stencil codes). But this paper demonstrates that this lack is also a chance as polyhedral methods can be used to generate stencil-specific cache-structures of the right sizes on the FPGA and to fill and flush them efficiently with wide bursts during stencil execution. The paper shows how to derive the appropriate directives and code restructurings from stencil codes so that the FPGA compiler generates fast stencil hardware. Switching on our optimization improves the runtime of a set of 10 stencils by between 43x and 156x.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Low-power high-performance VLSI design
