On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference
Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

TL;DR
This paper introduces scalable methods for compressing and rectifying co-occurrence statistics to enable robust large-vocabulary topic inference, maintaining efficiency and accuracy in high-dimensional settings.
Contribution
It proposes novel algorithms that simultaneously compress and rectify co-occurrence data, facilitating scalable and effective latent variable inference for large vocabularies.
Findings
Methods perform comparably to previous approaches on textual data.
Algorithms scale efficiently with vocabulary size.
Effective in both textual and non-textual domains.
Abstract
Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies
MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance
