PICASSO: Unleashing the Potential of GPU-centric Training for   Wide-and-deep Recommender Systems

Yuanxing Zhang; Langshi Chen; Siran Yang; Man Yuan; Huimin Yi; Jie; Zhang; Jiamang Wang; Jianbo Dong; Yunlong Xu; Yue Song; Yong Li; Di Zhang,; Wei Lin; Lin Qu; Bo Zheng

arXiv:2204.04903·cs.DC·April 19, 2022

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

Yuanxing Zhang, Langshi Chen, Siran Yang, Man Yuan, Huimin Yi, Jie, Zhang, Jiamang Wang, Jianbo Dong, Yunlong Xu, Yue Song, Yong Li, Di Zhang,, Wei Lin, Lin Qu, Bo Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces PICASSO, a GPU-centric framework that significantly improves training throughput and hardware utilization for wide-and-deep recommender systems, addressing underutilization issues in traditional GPU training.

Contribution

PICASSO systematically analyzes training bottlenecks and proposes novel packing, interleaving, and caching optimizations to enhance GPU utilization for recommendation models.

Findings

01

Up to 6x throughput improvement over SOTA baselines.

02

Increases hardware utilization by an order of magnitude.

03

Reduces daily training walltime by 7 hours on average.

Abstract

The development of personalized recommendation has significantly improved the accuracy of information matching and the revenue of e-commerce platforms. Recently, it has 2 trends: 1) recommender systems must be trained timely to cope with ever-growing new products and ever-changing user interests from online marketing and social network; 2) SOTA recommendation models introduce DNN modules to improve prediction accuracy. Traditional CPU-based recommender systems cannot meet these two trends, and GPU- centric training has become a trending approach. However, we observe that GPU devices in training recommender systems are underutilized, and they cannot attain an expected throughput improvement as what it has achieved in CV and NLP areas. This issue can be explained by two characteristics of these recommendation models: First, they contain up to a thousand input feature fields, introducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba/hybridbackend
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Caching and Content Delivery · Stochastic Gradient Optimization Techniques