MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with   Co-designed Compressed Neural Networks

Syuan-Hao Sie; Jye-Luen Lee; Yi-Ren Chen; Chih-Cheng Lu; Chih-Cheng; Hsieh; Meng-Fan Chang; Kea-Tiong Tang

arXiv:2010.12861·cs.AR·May 26, 2021

MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks

Syuan-Hao Sie, Jye-Luen Lee, Yi-Ren Chen, Chih-Cheng Lu, Chih-Cheng, Hsieh, Meng-Fan Chang, Kea-Tiong Tang

PDF

TL;DR

This paper introduces MARS, a co-designed SRAM CIM-based CNN accelerator with a novel compression algorithm that fuses batch normalization, exploits sparsity, and utilizes multiple macros for improved energy efficiency and throughput.

Contribution

It presents a hardware-software co-design approach for SRAM CIM-based CNN acceleration, including a BN fusion quantization, CIM-aware sparsity algorithm, and a multi-macro architecture.

Findings

01

Enhanced energy efficiency and throughput in CNN acceleration.

02

Effective model compression considering CIM hardware constraints.

03

Support for sparsity and multiple SRAM CIM macros in the accelerator.

Abstract

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Batch Normalization