Automatic Compiler Based FPGA Accelerator for CNN Training
Shreyas Kolala Venkataramanaiah, Yufei Ma, Shihui Yin, Eriko, Nurvithadhi, Aravind Dasu, Yu Cao, Jae-sun Seo

TL;DR
This paper introduces an automatic compiler-based FPGA accelerator for CNN training that supports complete training processes with high performance, addressing the complexity of on-device learning on embedded platforms.
Contribution
It presents a novel RTL library and compiler for FPGA-based CNN training, including a cyclic weight storage scheme and optimized hardware architecture.
Findings
Achieves up to 479 GOPS performance on CNN training.
Supports complete CNN training including forward, backward, and weight update phases.
Demonstrates effectiveness on Intel Stratix 10-GX FPGA with CIFAR-10.
Abstract
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days. Designing flexible training hard-ware is much more challenging than inference hardware, due to design complexity and large computation/memory requirement. In this work, we present an automatic compiler-based FPGA accelerator with 16-bit fixed-point precision for complete CNNtraining, including Forward Pass (FP), Backward Pass (BP) and Weight Update (WU). We implemented an optimized RTL library to perform training-specific tasks and developed an RTL compiler to automatically generate FPGA-synthesizable RTL based on user-defined constraints. We present a new cyclic weight storage/access scheme for on-chip BRAM and off-chip DRAMto efficiently implement non-transpose and transpose operations during FP and BP phases, respectively. Representative CNNs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Brain Tumor Detection and Classification
