SqueezeJet: High-level Synthesis Accelerator Design for Deep   Convolutional Neural Networks

Panagiotis G. Mousouliotis; Loukas P. Petrou

arXiv:1805.08695·cs.CV·November 27, 2018

SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks

Panagiotis G. Mousouliotis, Loukas P. Petrou

PDF

TL;DR

SqueezeJet is an FPGA accelerator designed for SqueezeNet, significantly speeding up inference on embedded systems with minimal accuracy loss, enabling real-time deep learning applications in resource-constrained environments.

Contribution

This paper introduces SqueezeJet, a high-level synthesis FPGA accelerator tailored for SqueezeNet, optimizing inference speed for embedded mobile hardware.

Findings

01

15.16x speed-up over software implementation

02

Less than 1% accuracy drop

03

Effective for real-time embedded applications

Abstract

Deep convolutional neural networks have dominated the pattern recognition scene by providing much more accurate solutions in computer vision problems such as object recognition and object detection. Most of these solutions come at a huge computational cost, requiring billions of multiply-accumulate operations and, thus, making their use quite challenging in real-time applications that run on embedded mobile (resource-power constrained) hardware. This work presents the architecture, the high-level synthesis design, and the implementation of SqueezeJet, an FPGA accelerator for the inference phase of the SqueezeNet DCNN architecture, which is designed specifically for use in embedded systems. Results show that SqueezeJet can achieve 15.16 times speed-up compared to the software implementation of SqueezeNet running on an embedded mobile processor with less than 1% drop in top-5 accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion-Convolutional Neural Networks · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · Fire Module · Global Average Pooling · 1x1 Convolution · Dropout · Xavier Initialization