TL;DR
Stream-HLS automates the design of high-performance dataflow architectures for FPGAs from high-level code, addressing key limitations of existing HLS tools through global scheduling and multi-kernel optimization.
Contribution
It introduces a novel framework on MLIR that automatically generates optimized FPGA dataflow architectures from high-level code, including multi-kernel and graph-level optimizations.
Findings
Achieves up to 79.43x speedup over prior frameworks.
Outperforms manual and existing automation frameworks in benchmarks.
Provides an open-source, extensible tool for FPGA dataflow design.
Abstract
High-level synthesis (HLS) has enabled the rapid development of custom hardware circuits for many software applications. However, developing high-performance hardware circuits using HLS is still a non-trivial task requiring expertise in hardware design. Further, the hardware design space, especially for multi-kernel applications, grows exponentially. Therefore, several HLS automation and abstraction frameworks have been proposed recently, but many issues remain unresolved. These issues include: 1) relying mainly on hardware directives (pragmas) to apply hardware optimizations without exploring loop scheduling opportunities. 2) targeting single-kernel applications only. 3) lacking automatic and/or global design space exploration. 4) missing critical hardware optimizations, such as graph-level pipelining for multi-kernel applications. To address these challenges, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
