Communication-Free Parallel Supervised Topic Models

Lee Gao; Ronghuo Zheng

arXiv:1708.03052·cs.LG·August 11, 2017

Communication-Free Parallel Supervised Topic Models

Lee Gao, Ronghuo Zheng

PDF

Open Access

TL;DR

This paper introduces a communication-free parallel MCMC algorithm for supervised Latent Dirichlet Allocation (sLDA) that overcomes the quasi-ergodicity problem, enabling faster training without sacrificing prediction accuracy.

Contribution

It proposes a novel parallel MCMC method for sLDA that reverses the sampling order to handle multimodal distributions effectively.

Findings

01

Parallel algorithm achieves similar prediction accuracy to non-parallel sLDA

02

Significantly reduces computation time

03

Overcomes quasi-ergodicity in parallel MCMC for topic models

Abstract

Embarrassingly (communication-free) parallel Markov chain Monte Carlo (MCMC) methods are commonly used in learning graphical models. However, MCMC cannot be directly applied in learning topic models because of the quasi-ergodicity problem caused by multimodal distribution of topics. In this paper, we develop an embarrassingly parallel MCMC algorithm for sLDA. Our algorithm works by switching the order of sampled topics combination and labeling variable prediction in sLDA, in which it overcomes the quasi-ergodicity problem because high-dimension topics that follow a multimodal distribution are projected into one-dimension document labels that follow a unimodal distribution. Our empirical experiments confirm that the out-of-sample prediction performance using our embarrassingly parallel algorithm is comparable to non-parallel sLDA while the computation time is significantly reduced.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Bayesian Methods and Mixture Models