Pre-RTL DNN Hardware Evaluator With Fused Layer Support
Chih-Chyau Yang, Tian-Sheuan Chang

TL;DR
This paper introduces a pre-RTL DNN hardware evaluator supporting fused layer processing, enabling faster hardware evaluation with reduced memory bandwidth, latency, and energy consumption.
Contribution
It presents a novel hardware evaluation tool that supports fused layer processing, improving evaluation speed and accuracy for DNN accelerators.
Findings
Layer fusion reduces memory bandwidth by 55.6%.
Latency improves by 36.7%.
Energy consumption decreases by 49.2%.
Abstract
With the popularity of the deep neural network (DNN), hardware accelerators are demanded for real time execution. However, lengthy design process and fast evolving DNN models make hardware evaluation hard to meet the time to market need. This paper proposes a pre-RTL DNN hardware evaluator that supports conventional layer-by-layer processing as well as the fused layer processing for low external bandwidth requirement. The evaluator supports two state-of-the-art accelerator architectures and finds the best hardware and layer fusion group The experimental results show the layer fusion scheme can achieve 55.6% memory bandwidth reduction, 36.7% latency improvement and 49.2% energy reduction compared with layer-by-layer operation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
