TL;DR
RecPipe introduces a co-designed system and hardware accelerator that jointly optimize recommendation quality and performance, achieving significant improvements in latency and throughput for deep learning recommendation systems.
Contribution
The paper presents RecPipe, a novel multi-stage pipeline decomposition and hardware accelerator design that jointly optimize recommendation quality and system efficiency.
Findings
RPAccel improves latency by 3x and throughput by 6x at iso-quality.
RecPipe's multi-stage pipeline enables better parallelism and efficiency.
Hardware-aware scheduling enhances ranking performance on commodity platforms.
Abstract
Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs).While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAc-cel is designed specifically to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
