Improving the Inference of Topic Models via Infinite Latent State   Replications

Daniel Rugeles; Zhen Hai; Juan Felipe Carmona; Manoranjan; Dash; Gao Cong

arXiv:2301.12974·cs.CL·January 31, 2023·1 cites

Improving the Inference of Topic Models via Infinite Latent State Replications

Daniel Rugeles, Zhen Hai, Juan Felipe Carmona, Manoranjan, Dash, Gao Cong

PDF

Open Access

TL;DR

This paper introduces Infinite Latent State Replication (ILR), a novel inference method for topic models that enhances robustness and accuracy over traditional collapsed Gibbs sampling by leveraging state augmentation.

Contribution

The paper proposes ILR, a new inference approach that maximizes the number of topic samples to infinity, improving topic assignment quality in probabilistic models.

Findings

01

ILR outperforms CGS on benchmark datasets.

02

ILR provides more robust soft topic assignments.

03

Experimental results demonstrate improved inference accuracy.

Abstract

In text mining, topic models are a type of probabilistic generative models for inferring latent semantic topics from text corpus. One of the most popular inference approaches to topic models is perhaps collapsed Gibbs sampling (CGS), which typically samples one single topic label for each observed document-word pair. In this paper, we aim at improving the inference of CGS for topic models. We propose to leverage state augmentation technique by maximizing the number of topic samples to infinity, and then develop a new inference approach, called infinite latent state replication (ILR), to generate robust soft topic assignment for each given document-word pair. Experimental results on the publicly available datasets show that ILR outperforms CGS for inference of existing established topic models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques