FERMI-ML: A Flexible and Resource-Efficient Memory-In-Situ SRAM Macro for TinyML acceleration
Mukul Lokhande, Akash Sankhe, S. V. Jaya Chand, and Santosh Kumar Vishvakarma

TL;DR
FERMI-ML introduces a novel, flexible SRAM macro optimized for TinyML, enabling in-situ computation and lookup operations with high efficiency and reconfigurability on low-power AIoT devices.
Contribution
It presents a new 9T XNOR-based SRAM macro with integrated compute and memory functions, supporting variable-precision MAC and CAM operations within a compact design.
Findings
Achieves 1.93 TOPS throughput at 350 MHz and 0.9 V in 65 nm technology.
Offers 364 TOPS/W energy efficiency for TinyML workloads.
Maintains over 97.5% QoR on models like InceptionV4 and ResNet-18.
Abstract
The growing demand for low-power and area-efficient TinyML inference on AIoT devices necessitates memory architectures that minimise data movement while sustaining high computational efficiency. This paper presents FERMI-ML, a Flexible and Resource-Efficient Memory-In-Situ (MIS) SRAM macro designed for TinyML acceleration. The proposed 9T XNOR-based RX9T bit-cell integrates a 5T storage cell with a 4T XNOR compute unit, enabling variable-precision MAC and CAM operations within the same array. A 22-transistor (C22T) compressor-tree-based accumulator facilitates logarithmic 1-64-bit MAC computation with reduced delay and power compared to conventional adder trees. The 4 KB macro achieves dual functionality for in-situ computation and CAM-based lookup operations, supporting Posit-4 or FP-4 precision. Post-layout results at 65 nm show operation at 350 MHz with 0.9 V, delivering a throughput…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
