On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

Moontae Lee; Sungjun Cho; Kun Dong; David Mimno; David Bindel

arXiv:2111.06580·cs.CL·November 15, 2021

On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

PDF

Open Access 1 Video

TL;DR

This paper introduces scalable methods for compressing and rectifying co-occurrence statistics to enable robust large-vocabulary topic inference, maintaining efficiency and accuracy in high-dimensional settings.

Contribution

It proposes novel algorithms that simultaneously compress and rectify co-occurrence data, facilitating scalable and effective latent variable inference for large vocabularies.

Findings

01

Methods perform comparably to previous approaches on textual data.

02

Algorithms scale efficiently with vocabulary size.

03

Effective in both textual and non-textual domains.

Abstract

Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On-the-fly Rectification for Robust Large-Vocabulary Topic Inference· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance