TL;DR
CODO is an automated FPGA compiler that optimizes dataflow architectures, significantly improving latency and speed for kernels and models like DNNs and GPT-2.
Contribution
It introduces a systematic method for detecting dataflow violations, optimizing data movement, and automatic scheduling to generate high-performance FPGA accelerators.
Findings
Achieves 1.45x to 4.52x latency speedups on kernels.
Attains 3.7x to 33.8x speedups on DNN models.
Realizes 7.3x average speedup on CNNs and 2.07x on GPT-2 over SOTA.
Abstract
FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient dataflow architecture for large-scale applications is still challenging, even for specialists who use high-level synthesis (HLS) to simplify FPGA programming. To address this, we introduce CODO, an automated compiler that generates feasible and efficient dataflow accelerators on FPGAs. CODO features a systematic method for detecting and eliminating both coarse-grained and fine-grained dataflow violations. Building on this, CODO performs both on- and off-chip data movement optimizations to maximize transfer efficiency. To guarantee a higher design quality, CODO performs automatic scheduling to generate high-performance dataflow accelerators, ensuring a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
