EZLDA: Efficient and Scalable LDA on GPUs

Shilong Wang (1); Hang Liu (2); Anil Gaihre (2); Hengyong Yu (1) ((1); University of Massachusetts Lowell; (2) Stevens Institute of Technology)

arXiv:2007.08725·cs.DC·July 20, 2020

EZLDA: Efficient and Scalable LDA on GPUs

Shilong Wang (1), Hang Liu (2), Anil Gaihre (2), Hengyong Yu (1) ((1), University of Massachusetts Lowell, (2) Stevens Institute of Technology)

PDF

TL;DR

EZLDA is a GPU-accelerated, scalable LDA implementation that introduces novel sampling, data formats, and workload balancing techniques to significantly improve performance and reduce memory usage.

Contribution

EZLDA presents a three-branch sampling method, a hybrid sparse format, and a hierarchical workload balancing scheme for efficient GPU-based LDA training.

Findings

01

EZLDA outperforms existing methods in speed.

02

It uses less memory than prior approaches.

03

Scales effectively across multiple GPUs.

Abstract

LDA is a statistical approach for topic modeling with a wide range of applications. However, there exist very few attempts to accelerate LDA on GPUs which come with exceptional computing and memory throughput capabilities. To this end, we introduce EZLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, EZLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scaleEZLDA across multiple GPUs. Taken…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis