Distributed Hierarchical GPU Parameter Server for Massive Scale Deep   Learning Ads Systems

Weijie Zhao; Deping Xie; Ronglai Jia; Yulei Qian; Ruiquan Ding,; Mingming Sun; Ping Li

arXiv:2003.05622·cs.DC·March 13, 2020·76 cites

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding,, Mingming Sun, Ping Li

PDF

Open Access 2 Repos

TL;DR

This paper presents a hierarchical GPU-based parameter server system designed for massive-scale deep learning in online advertising, enabling faster training of billion-parameter models with improved cost efficiency.

Contribution

It introduces a novel three-layer hierarchical storage architecture utilizing GPU memory, CPU memory, and SSD for scalable deep learning training of extremely large models.

Findings

01

4-node hierarchical GPU server trains 2x faster than 150-node in-memory server

02

System achieves 4-9x better price-performance ratio than MPI clusters

03

Effective handling of models with over 10^11 sparse features

Abstract

Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $1 0^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Recommender Systems and Techniques · Stochastic Gradient Optimization Techniques

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD