Disaggregating Embedding Recommendation Systems with FlexEMR
Yibo Huang, Zhenning Yang, Jiarong Xing, Yi Dai, Yiming Qiu, Dingming, Wu, Fan Lai, Ang Chen

TL;DR
FlexEMR introduces a disaggregation framework for embedding recommendation systems, optimizing resource utilization and reducing network data movement through locality-aware techniques and an RDMA engine, addressing key networking challenges.
Contribution
The paper presents FlexEMR, a novel disaggregation approach with techniques to minimize network data transfer and improve efficiency in large-scale embedding recommendation systems.
Findings
Reduced network data movement via locality-aware lookup techniques
Designed an optimized multi-threaded RDMA engine for concurrent requests
Initial prototype shows promising resource utilization improvements
Abstract
Efficiently serving embedding-based recommendation (EMR) models remains a significant challenge due to their increasingly large memory requirements. Today's practice splits the model across many monolithic servers, where a mix of GPUs, CPUs, and DRAM is provisioned in fixed proportions. This approach leads to suboptimal resource utilization and increased costs. Disaggregating embedding operations from neural network inference is a promising solution but raises novel networking challenges. In this paper, we discuss the design of FlexEMR for optimized EMR disaggregation. FlexEMR proposes two sets of techniques to tackle the networking challenges: Leveraging the temporal and spatial locality of embedding lookups to reduce data movement over the network, and designing an optimized multi-threaded RDMA engine for concurrent lookup subrequests. We outline the design space for each technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Privacy-Preserving Technologies in Data
