Large-Scale Training System for 100-Million Classification at Alibaba

Liuyihan Song; Pan Pan; Kang Zhao; Hao Yang; Yiming Chen; and Yingya Zhang; Yinghui Xu; Rong Jin

arXiv:2102.06025·cs.LG·February 12, 2021

Large-Scale Training System for 100-Million Classification at Alibaba

Liuyihan Song, Pan Pan, Kang Zhao, Hao Yang, Yiming Chen, and Yingya Zhang, Yinghui Xu, Rong Jin

PDF

TL;DR

This paper introduces a large-scale training system for 100-million class classification, combining a hybrid parallel framework, a novel KNN softmax, and optimization strategies to significantly improve training efficiency and reduce iterations.

Contribution

The paper presents a novel large-scale training system with a new softmax variation and optimization techniques, enabling efficient training of extremely large classifiers.

Findings

01

3.9× training throughput increase

02

60% reduction in training iterations

03

Successful training of 100 million classes in five days

Abstract

In the last decades, extreme classification has become an essential topic for deep learning. It has achieved great success in many areas, especially in computer vision and natural language processing (NLP). However, it is very challenging to train a deep model with millions of classes due to the memory and computation explosion in the last output layer. In this paper, we propose a large-scale training system to address these challenges. First, we build a hybrid parallel training framework to make the training process feasible. Second, we propose a novel softmax variation named KNN softmax, which reduces both the GPU memory consumption and computation costs and improves the throughput of training. Then, to eliminate the communication overhead, we propose a new overlapping pipeline and a gradient sparsification method. Furthermore, we design a fast continuous convergence strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGradient Sparsification · Softmax