Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL
Adel Ejjeh, Vikram Adve, Rob Rutenbar

TL;DR
This paper evaluates the effectiveness of automatic optimizations in the Intel FPGA SDK for OpenCL, demonstrating significant speedups over CPU with manual tuning, and discusses the tradeoffs involved.
Contribution
It provides a detailed analysis of the tradeoffs in HLS optimizations and assesses the potential of automatic optimization features in a real-world application.
Findings
Automatic optimizations yield up to 2.7X speedup over CPU.
Manual tuning can achieve up to 36.5X speedup over CPU.
Automatic optimizations are valuable but may require manual rewriting for peak performance.
Abstract
High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, improve design productivity and enable efficient design space exploration guided by simple program directives (pragmas), but may sometimes miss important optimizations necessary for high performance. In this paper, we present a study of the tradeoffs in HLS optimizations, and the potential of a modern HLS tool in automatically optimizing an application. We perform the study on a 5-stage camera ISP pipeline using the Intel FPGA SDK for OpenCL and an Arria 10 FPGA Dev Kit. We show that automatic optimizations in the HLS tool are valuable, achieving a up to 2.7X speedup over equivalent CPU execution. With further hand tuning, however, we can achieve up to 36.5X speedup over CPU. We draw several specific lessons about the effectiveness of automatic optimizations guided by simple directives, and the nature of manual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
