SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu

TL;DR
SaberLDA introduces a sparsity-aware GPU algorithm for large-scale topic modeling, enabling efficient learning of billions of tokens with up to 10,000 topics on a single GPU, significantly surpassing previous GPU systems.
Contribution
It presents a novel sparsity-aware algorithm, data layout, and GPU kernel design that enable scalable LDA on GPUs for large datasets and many topics.
Findings
Supports learning from billions of tokens with up to 10,000 topics
Achieves sublinear time complexity for LDA on GPUs
Reduces memory consumption and improves locality in GPU implementations
Abstract
Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics. In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Data Management and Algorithms · Image Retrieval and Classification Techniques
MethodsLinear Discriminant Analysis
