MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks
Syuan-Hao Sie, Jye-Luen Lee, Yi-Ren Chen, Chih-Cheng Lu, Chih-Cheng, Hsieh, Meng-Fan Chang, Kea-Tiong Tang

TL;DR
This paper introduces MARS, a co-designed SRAM CIM-based CNN accelerator with a novel compression algorithm that fuses batch normalization, exploits sparsity, and utilizes multiple macros for improved energy efficiency and throughput.
Contribution
It presents a hardware-software co-design approach for SRAM CIM-based CNN acceleration, including a BN fusion quantization, CIM-aware sparsity algorithm, and a multi-macro architecture.
Findings
Enhanced energy efficiency and throughput in CNN acceleration.
Effective model compression considering CIM hardware constraints.
Support for sparsity and multiple SRAM CIM macros in the accelerator.
Abstract
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Batch Normalization
