SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware
Narendra Singh Dhakad, Santosh Kumar Vishvakarma

TL;DR
This paper introduces a SRAM-based compute engine with novel in-memory XNOR operations and optimized adders, significantly reducing area and latency for AI hardware, especially binary neural networks.
Contribution
It proposes a 10T SRAM cell architecture with integrated full adder and optimized ripple carry adder, achieving substantial improvements in area efficiency and latency over existing designs.
Findings
Achieves 50% reduction in routing complexity.
Reduces area by a factor of 2.67x compared to state-of-the-art.
Enhances latency for MAC operations in BNNs.
Abstract
This paper presents a novel architecture utilizing a 10T SRAM cell for XNOR-based in-memory computing, aimed at mitigating the extensive routing challenges typically encountered in conventional in-memory computing systems. By integrating a full adder between in-memory multiplication cells, the proposed design achieves a 50% reduction in routing complexity. The architecture performs multiply-accumulate (MAC) operations using XNOR computation optimized for binary neural networks (BNNs). Additionally, a 14T-based full adder is employed to construct an N-bit ripple carry adder in the adder tree, significantly reducing the area compared to traditional 28T-based CMOS designs. The 10T SRAM XNOR computation further enhances the latency for MAC operations. The proposed approach reduces the latency and area overhead, improving the overall hardware's area efficiency by 2.67x compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
