Supporting Massive DLRM Inference Through Software Defined Memory

Ehsan K. Ardestani; Changkyu Kim; Seung Jae Lee; Luoshang Pan; Valmiki; Rampersad; Jens Axboe; Banit Agrawal; Fuxun Yu; Ansha Yu; Trung Le; Hector; Yuen; Shishir Juluri; Akshat Nanda; Manoj Wodekar; Dheevatsa Mudigere,; Krishnakumar Nair; Maxim Naumov; Chris Peterson; Mikhail Smelyanskiy; Vijay; Rao

arXiv:2110.11489·cs.AR·November 10, 2021

Supporting Massive DLRM Inference Through Software Defined Memory

Ehsan K. Ardestani, Changkyu Kim, Seung Jae Lee, Luoshang Pan, Valmiki, Rampersad, Jens Axboe, Banit Agrawal, Fuxun Yu, Ansha Yu, Trung Le, Hector, Yuen, Shishir Juluri, Akshat Nanda, Manoj Wodekar, Dheevatsa Mudigere,, Krishnakumar Nair, Maxim Naumov, Chris Peterson

PDF

Open Access

TL;DR

This paper explores how Software Defined Memory can enable efficient inference for massive Deep Learning Recommendation Models by leveraging Storage ClassMemory, reducing power consumption and cost.

Contribution

It evaluates challenges and proposes techniques for integrating Storage ClassMemory into DLRM inference, highlighting technology differences and power savings.

Findings

01

Power savings of 5% to 29% achieved

02

Different SCM technologies impact performance and efficiency

03

Techniques enable scalable inference for large DLRMs

Abstract

Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for inference enables lower power consumption and cost. This paper evaluates the major challenges in extending the memory hierarchy to SCM for DLRM, and presents different techniques to improve performance through a Software Defined Memory. We show how underlying technologies such as Nand Flash and 3DXP differentiate, and relate to real world scenarios, enabling from 5% to 29% power savings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Caching and Content Delivery · Advanced Data Storage Technologies