A correlated topic model of Science
David M. Blei, John D. Lafferty

TL;DR
This paper introduces the correlated topic model (CTM), an extension of LDA that captures topic correlations using the logistic normal distribution, improving data fit and usefulness for exploring large document collections.
Contribution
The paper develops the CTM with a fast variational inference algorithm, allowing modeling of correlated topics in large text datasets, unlike traditional LDA.
Findings
CTM outperforms LDA in data fit on scientific articles
The model effectively captures correlations among topics
Demonstrates usefulness as an exploratory tool for large collections
Abstract
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than X-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139--177]. We derive a fast variational inference algorithm for approximate posterior inference in this model, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
