Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations
Ranggi Hwang, Taehun Kim, Youngeun Kwon, Minsoo Rhu

TL;DR
Centaur is a chiplet-based accelerator designed to efficiently handle both sparse embedding and dense MLP layers in personalized recommendation ML workloads, achieving significant speedup and energy efficiency improvements.
Contribution
It introduces a hybrid sparse-dense accelerator architecture tailored for recommendation workloads, addressing memory and compute bottlenecks.
Findings
Achieves 1.7-17.2x performance speedup
Realizes 1.7-19.5x energy-efficiency improvements
Effectively accelerates both embedding and MLP layers in recommendation models
Abstract
Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated CPU+FPGA device, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Recommender Systems and Techniques · Stochastic Gradient Optimization Techniques
