Time-multiplexed In-memory computation scheme for mapping Quantized Neural Networks on hybrid CMOS-OxRAM building blocks
Sandeep Kaur Kingra, Vivek Parmar, Manoj Sharma, Manan Suri

TL;DR
This paper demonstrates CMOS and OxRAM-based components for binary and ternary neural networks, introduces an optimized mapping scheme, and shows significant memory savings with minimal performance loss on Fashion MNIST.
Contribution
It presents a novel hardware implementation for QNNs using CMOS neurons and OxRAM synapses, with an optimized programming scheme and analysis of device variability effects.
Findings
Memory savings of 16-32x on FMNIST dataset.
Performance change of less than 5% compared to ideal QNNs.
Robustness analysis of OxRAM variability on network performance.
Abstract
In this work, we experimentally demonstrate two key building blocks for realizing Binary/Ternary Neural Networks (BNNs/TNNs): (i) 130 nm CMOS based sigmoidal neurons and (ii) HfOx based multi-level (MLC) OxRAM-synaptic blocks. An optimized vector matrix multiplication programming scheme that utilizes the two building blocks is also presented. Compared to prior approaches that utilize differential synaptic structures, a single device per synapse with two sets of READ operations is used. Proposed hardware mapping strategy shows performance change of <5% (decrease of 2-5% for TNN, increase of 0.2% for BNN) compared to ideal quantized neural networks (QNN) with significant memory savings in the order of 16-32x for classification problem on Fashion MNIST (FMNIST) dataset. Impact of OxRAM device variability on the performance of Hardware QNN (BNN/TNN) is also analyzed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
