iMARS: An In-Memory-Computing Architecture for Recommendation Systems
Mengyuan Li, Ann Franchesca Laguna, Dayane Reis, Xunzhao Yin, Michael, Niemier, and Xiaobo Sharon Hu

TL;DR
This paper introduces iMARS, an in-memory computing architecture designed to accelerate recommendation systems by reducing latency and energy consumption through specialized hardware for embedding table operations.
Contribution
The paper presents a novel IMC architecture, iMARS, optimized for recommendation systems, demonstrating significant performance and energy efficiency improvements over GPUs.
Findings
16.8x latency reduction on MovieLens dataset
713x energy efficiency improvement
Effective integration of ferroelectric FET based IMC fabric
Abstract
Recommendation systems (RecSys) suggest items to users by predicting their preferences based on historical data. Typical RecSys handle large embedding tables and many embedding table related operations. The memory size and bandwidth of the conventional computer architecture restrict the performance of RecSys. This work proposes an in-memory-computing (IMC) architecture (iMARS) for accelerating the filtering and ranking stages of deep neural network-based RecSys. iMARS leverages IMC-friendly embedding tables implemented inside a ferroelectric FET based IMC fabric. Circuit-level and system-level evaluation show that \fw achieves 16.8x (713x) end-to-end latency (energy) improvement compared to the GPU counterpart for the MovieLens dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Caching and Content Delivery
