# Topic Modeling in Embedding Spaces

**Authors:** Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei

arXiv: 1907.04907 · 2019-07-12

## TL;DR

The paper introduces the Embedded Topic Model (ETM), a novel approach combining word embeddings with traditional topic models to improve interpretability and performance on large, heavy-tailed vocabularies.

## Contribution

It develops the ETM, a new generative model that integrates word embeddings into topic modeling, enabling better interpretability and scalability.

## Key findings

- ETM discovers interpretable topics with large vocabularies.
- ETM outperforms LDA in topic quality.
- ETM achieves superior predictive performance.

## Abstract

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In particular, it models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. To fit the ETM, we develop an efficient amortized variational inference algorithm. The ETM discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation (LDA), in terms of both topic quality and predictive performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.04907/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1907.04907/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1907.04907/full.md

---
Source: https://tomesphere.com/paper/1907.04907