Keyword Assisted Embedded Topic Model

Bahareh Harandizadeh; J. Hunter Priniski; Fred Morstatter

arXiv:2112.03101·cs.IR·December 7, 2021

Keyword Assisted Embedded Topic Model

Bahareh Harandizadeh, J. Hunter Priniski, Fred Morstatter

PDF

1 Repo

TL;DR

The paper introduces KeyETM, a novel topic modeling approach that incorporates user-provided keywords as priors to generate more accurate and semantically meaningful topics from large text corpora.

Contribution

KeyETM extends the Embedded Topic Model by integrating user knowledge through informative priors, improving topic quality over existing models.

Findings

01

KeyETM outperforms other guided models in quantitative metrics.

02

Human evaluations show improved topic relevance with KeyETM.

03

The model effectively leverages user input to enhance semantic coherence.

Abstract

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA), describe how words in documents are generated via a set of latent distributions called topics. Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics. As LDA and its extensions are unsupervised models, they aren't defined to make efficient use of a user's prior knowledge of the domain. To this end, we propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors over the vocabulary. Using both quantitative metrics and human responses on a topic intrusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bahareharandizade/keyetm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis