A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration
Suvadeep Banerjee, Steve Burns, Pasquale Cocchini, Abhijit Davare,, Shweta Jain, Desmond Kirkpatrick, Anton Sorokin, Jin Yang, Zhenkun Yang

TL;DR
This paper presents a highly configurable hardware/software stack for DNN inference acceleration, enhancing the VTA architecture and TVM compiler to support more workloads with improved performance and flexibility.
Contribution
It introduces extensive micro-architectural enhancements and a flexible compilation stack, enabling a broad range of configurations for efficient DNN inference acceleration.
Findings
4.9x fewer cycles for ResNet-18 with minimal area increase
11.5x cycle reduction at 12x area cost
Supports all layers of Mobilenet 1.0 and ResNets
Abstract
This work focuses on an efficient Agile design methodology for domain-specific accelerators. We employ feature-by-feature enhancement of a vertical development stack and apply it to the TVM/VTA inference accelerator. We have enhanced the VTA design space and enabled end-to-end support for additional workloads. This has been accomplished by augmenting the VTA micro-architecture and instruction set architecture (ISA), as well as by enhancing the TVM compilation stack to support a wide range of VTA configs. The VTA tsim implementation (CHISEL-based) has been enhanced with fully pipelined versions of the ALU/GEMM execution units. In tsim, memory width can now range between 8-64 bytes. Field widths have been made more flexible to support larger scratchpads. New instructions have been added: element-wise 8-bit multiplication to support depthwise convolution, and load with a choice of pad…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Network Packet Processing and Optimization · Embedded Systems Design Techniques
