Topic Discovery through Data Dependent and Random Projections
Weicong Ding, Mohammad H. Rohban, Prakash Ishwar, Venkatesh Saligrama

TL;DR
This paper introduces efficient algorithms for topic modeling that leverage the geometry of word-frequency patterns, especially under the separability condition, with proven statistical guarantees and scalable computation.
Contribution
It proposes novel data-dependent and random projection algorithms for identifying unique topic words, with theoretical guarantees and linear scalability.
Findings
Algorithms successfully identify novel words in synthetic and real datasets.
Computational complexity scales linearly with data size.
Statistical guarantees hold under mild prior assumptions.
Abstract
We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Music and Audio Processing · Topic Modeling
