RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on   Edge

Adithya Krishna; Srikanth Rohit Nudurupati; Chandana D G; Pritesh; Dwivedi; Andr\'e van Schaik; Mahesh Mehendale; Chetan Singh Thakur

arXiv:2306.06493·cs.NE·June 13, 2023·1 cites

RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Adithya Krishna, Srikanth Rohit Nudurupati, Chandana D G, Pritesh, Dwivedi, Andr\'e van Schaik, Mahesh Mehendale, Chetan Singh Thakur

PDF

Open Access

TL;DR

RAMAN is a reconfigurable, sparse accelerator designed for edge inference of DNNs, reducing area, power, and latency by exploiting sparsity and employing an innovative dataflow, suitable for various models and accuracy-power tradeoffs.

Contribution

It introduces RAMAN, a novel reconfigurable sparse accelerator architecture that leverages a Gustavson-inspired dataflow and memory overlap techniques for efficient edge DNN inference.

Findings

01

Processes MobileNetV1 at 98.47 GOp/s/W

02

Achieves 79.68 GOp/s/W on DS-CNN

03

Reduces storage by up to 50% through memory overlap

Abstract

Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this paper, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency. RAMAN can be configured to support a wide range of DNN topologies - consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy vs power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Brain Tumor Detection and Classification

MethodsDepthwise Convolution · Pointwise Convolution · Dense Connections · Depthwise Separable Convolution · Batch Normalization · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Softmax · Convolution