Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Christopher E Moody

arXiv:1605.02019·cs.CL·May 9, 2016·147 cites

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Christopher E Moody

PDF

Open Access 5 Repos

TL;DR

Lda2vec combines word embeddings and topic models to produce interpretable, sparse document representations while learning dense word vectors, enhancing semantic understanding in language processing.

Contribution

This work introduces lda2vec, a novel model that jointly learns word vectors and topic distributions with a simple, differentiable framework for interpretable document representations.

Findings

01

Produces sparse, interpretable document mixtures

02

Jointly learns word vectors and topic relationships

03

Easily integrated into existing frameworks

Abstract

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe lda2vec, a model that learns dense word vectors jointly with Dirichlet-distributed latent document-level mixtures of topic vectors. In contrast to continuous dense document representations, this formulation produces sparse, interpretable document mixtures through a non-negative simplex constraint. Our method is simple to incorporate into existing automatic differentiation frameworks and allows for unsupervised document representations geared for use by scientists while simultaneously learning word vectors and the linear relationships between them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods

Methodslda2vec