Tensor Casting: Co-Designing Algorithm-Architecture for Personalized   Recommendation Training

Youngeun Kwon; Yunjae Lee; Minsoo Rhu

arXiv:2010.13100·cs.AR·October 27, 2020

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Youngeun Kwon, Yunjae Lee, Minsoo Rhu

PDF

TL;DR

This paper investigates the training of personalized recommendation models, identifies sparse embedding layer training as a bottleneck, and proposes Tensor Casting, a co-designed accelerator architecture that significantly improves training throughput.

Contribution

It introduces Tensor Casting, a novel algorithm-architecture co-design for tensor gather-scatter, optimizing recommendation training on CPU-GPU systems.

Findings

01

Tensor Casting achieves up to 21x training throughput improvement.

02

Workload characterization highlights sparse embedding training as a key bottleneck.

03

Prototyping demonstrates effectiveness on real CPU-GPU systems.

Abstract

Personalized recommendations are one of the most widely deployed machine learning (ML) workload serviced from cloud datacenters. As such, architectural solutions for high-performance recommendation inference have recently been the target of several prior literatures. Unfortunately, little have been explored and understood regarding the training side of this emerging ML workload. In this paper, we first perform a detailed workload characterization study on training recommendations, root-causing sparse embedding layer training as one of the most significant performance bottlenecks. We then propose our algorithm-architecture co-design called Tensor Casting, which enables the development of a generic accelerator architecture for tensor gather-scatter that encompasses all the key primitives of training embedding layers. When prototyped on a real CPU-GPU system, Tensor Casting provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.