Optimization Techniques to Improve Inference Performance of a Forward Propagating Neural Network on an FPGA
Matthew Joseph Adiletta, Brian Flanagan

TL;DR
This paper presents an FPGA-based implementation of a trained neural network with optimization techniques like data scaling, simplified activation functions, and hardware-friendly design, achieving improved inference performance over software on a CPU.
Contribution
It introduces a novel Python-to-Verilog workflow for optimized FPGA neural network inference, emphasizing hardware-friendly design and performance enhancement.
Findings
FPGA implementation outperforms CPU in inference speed
Optimizations enable scalable and efficient neural network hardware deployment
Hardware-friendly activation functions simplify FPGA design
Abstract
This paper describes an optimized implementation of a Forward Propagating Classification Neural Network which has been previously trained. The implementation described highlights a novel means of using Python scripts to generate a Verilog hardware implementation. The characteristics of this implementation include optimizations to scale input data, use selected addends instead of multiplication functions, hardware friendly activation functions and simplified output selection. Inference performance comparison of a 28x28 pixel 'hand-written' recognition NN between a software implementation on an Intel i7 vs a Xilinx FPGA will be detailed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors
