Phantom: A High-Performance Computational Core for Sparse Convolutional   Neural Networks

Mahmood Azhar Qureshi; Arslan Munir

arXiv:2111.05002·cs.AR·November 10, 2021

Phantom: A High-Performance Computational Core for Sparse Convolutional Neural Networks

Mahmood Azhar Qureshi, Arslan Munir

PDF

Open Access

TL;DR

Phantom is a flexible, high-performance neural core architecture designed to efficiently accelerate sparse CNNs, supporting various layer types and improving hardware utilization through dynamic scheduling and load balancing.

Contribution

The paper introduces Phantom, a novel multi-threaded, dynamic neural core with a 2D mesh architecture that supports all CNN layers, including non-unit stride and fully-connected layers, outperforming existing accelerators.

Findings

01

Achieves up to 12x speedup over dense architectures

02

Outperforms SCNN, SparTen, and Eyeriss v2 in benchmarks

03

Supports all CNN layers with improved load balancing

Abstract

Sparse convolutional neural networks (CNNs) have gained significant traction over the past few years as sparse CNNs can drastically decrease the model size and computations, if exploited befittingly, as compared to their dense counterparts. Sparse CNNs often introduce variations in the layer shapes and sizes, which can prevent dense accelerators from performing well on sparse CNN models. Recently proposed sparse accelerators like SCNN, Eyeriss v2, and SparTen, actively exploit the two-sided or full sparsity, that is, sparsity in both weights and activations, for performance gains. These accelerators, however, either have inefficient micro-architecture, which limits their performance, have no support for non-unit stride convolutions and fully-connected (FC) layers, or suffer massively from systematic load imbalance. To circumvent these issues and support both sparse and dense models, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning