Analyzing Political Text at Scale with Online Tensor LDA
Sara Kangaslahti, Danny Ebanks, Jean Kossaifi, Anqi Liu, R. Michael Alvarez, and Animashree Anandkumar

TL;DR
This paper introduces Tensor LDA, a scalable, efficient topic modeling method capable of analyzing billions of documents, enabling large-scale social science research on social media conversations.
Contribution
We develop Tensor LDA with theoretical guarantees, demonstrate its computational efficiency, and provide an open-source GPU implementation for large-scale text analysis.
Findings
Analyzed the evolution of the #MeToo movement on Twitter over two years.
Studied social media discussions on election fraud in the 2020 US presidential election.
Achieved 3-4x faster speeds than prior LDA methods on large datasets.
Abstract
This paper proposes a topic modeling method that scales linearly to billions of documents. We make three core contributions: i) we present a topic modeling method, Tensor Latent Dirichlet Allocation (TLDA), that has identifiable and recoverable parameter guarantees and sample complexity guarantees for large data; ii) we show that this method is computationally and memory efficient (achieving speeds over 3-4x those of prior parallelized Latent Dirichlet Allocation (LDA) methods), and that it scales linearly to text datasets with over a billion documents; iii) we provide an open-source, GPU-based implementation, of this method. This scaling enables previously prohibitive analyses, and we perform two real-world, large-scale new studies of interest to political scientists: we provide the first thorough analysis of the evolution of the #MeToo movement through the lens of over two years of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Topic Modeling
