A Scalable Asynchronous Distributed Algorithm for Topic Modeling
Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S.V.N Vishwanathan, and Inderjit S. Dhillon

TL;DR
This paper introduces F+Nomad LDA, a scalable asynchronous distributed algorithm for topic modeling that efficiently handles millions of documents and thousands of topics using a modified Fenwick tree and an innovative asynchronous framework.
Contribution
The paper presents a novel algorithm combining a modified Fenwick tree with an asynchronous distributed framework for scalable topic modeling on massive datasets.
Findings
Outperforms state-of-the-art methods on large-scale datasets
Handles thousands of topics efficiently in $O( ext{log} T)$ time
Effectively distributes computation across multiple machines
Abstract
Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over items in time. Moreover, when topic counts change the data structure can be updated in time. In order to distribute the computation across multiple processor we present a novel asynchronous framework inspired by the Nomad algorithm of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Graph Theory and Algorithms
MethodsLinear Discriminant Analysis
