Supporting Massive DLRM Inference Through Software Defined Memory
Ehsan K. Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Valmiki, Rampersad, Jens Axboe, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector, Yuen, Shishir Juluri, Akshat Nanda, Manoj Wodekar, Dheevatsa Mudigere,, Krishnakumar Nair, Maxim Naumov, Chris Peterson

TL;DR
This paper explores how Software Defined Memory can enable efficient inference for massive Deep Learning Recommendation Models by leveraging Storage ClassMemory, reducing power consumption and cost.
Contribution
It evaluates challenges and proposes techniques for integrating Storage ClassMemory into DLRM inference, highlighting technology differences and power savings.
Findings
Power savings of 5% to 29% achieved
Different SCM technologies impact performance and efficiency
Techniques enable scalable inference for large DLRMs
Abstract
Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for inference enables lower power consumption and cost. This paper evaluates the major challenges in extending the memory hierarchy to SCM for DLRM, and presents different techniques to improve performance through a Software Defined Memory. We show how underlying technologies such as Nand Flash and 3DXP differentiate, and relate to real world scenarios, enabling from 5% to 29% power savings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Caching and Content Delivery · Advanced Data Storage Technologies
