Combined Spatial and Temporal Blocking for High-Performance Stencil   Computation on FPGAs Using OpenCL

Hamid Reza Zohouri; Artur Podobas; Satoshi Matsuoka

arXiv:1802.00438·cs.DC·October 16, 2019

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

PDF

1 Repo

TL;DR

This paper presents a novel FPGA-based stencil accelerator using combined spatial and temporal blocking with OpenCL, achieving GPU-competitive performance without input size restrictions and projecting high performance on future FPGA devices.

Contribution

It introduces a new FPGA stencil acceleration method that combines spatial and temporal blocking, overcoming previous input size limitations and guided by a performance model.

Findings

01

Achieves up to 760 GFLOP/s on Arria 10 for 2D stencils.

02

Attains 375 GFLOP/s on Arria 10 for 3D stencils.

03

Projects up to 3.5 TFLOP/s on upcoming Stratix 10 devices.

Abstract

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for stencil computation, most previous work achieve this by avoiding spatial blocking and restricting input dimensions relative to FPGA on-chip memory. In this work we create a stencil accelerator using Intel FPGA SDK for OpenCL that achieves high performance without having such restrictions. We combine spatial and temporal blocking to avoid input size restrictions, and employ multiple FPGA-specific optimizations to tackle issues arisen from the added design complexity. Accelerator parameter tuning is guided by our performance model, which we also use to project performance for the upcoming Intel Stratix 10 devices. On an Arria 10 GX 1150 device,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zohourih/Diffusion_FPGA
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.