A Deep Learning Inference Scheme Based on Pipelined Matrix   Multiplication Acceleration Design and Non-uniform Quantization

Yuyang Zhang; Dik Hin Leung; Min Guo; Yijia Xiao; Haoyue Liu; Yunfei; Li; Jiyuan Zhang; Guan Wang; Zhen Chen

arXiv:2110.04861·cs.LG·October 12, 2021

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Yuyang Zhang, Dik Hin Leung, Min Guo, Yijia Xiao, Haoyue Liu, Yunfei, Li, Jiyuan Zhang, Guan Wang, Zhen Chen

PDF

Open Access

TL;DR

This paper presents a low-power FPGA-based MLP accelerator that uses pipelined matrix multiplication and non-uniform quantization to improve performance and reduce power consumption in edge computing deep learning inference.

Contribution

It introduces a novel FPGA implementation of a pipelined matrix multiplication scheme combined with non-uniform quantization for efficient deep learning inference.

Findings

01

Achieves better performance on handwritten digit classification and Q-learning tasks.

02

Reduces power consumption compared to existing methods.

03

Demonstrates effectiveness of non-uniform quantization in FPGA accelerators.

Abstract

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Brain Tumor Detection and Classification · Advanced Memory and Neural Computing

MethodsQ-Learning