Source-LDA: Enhancing probabilistic topic models using prior knowledge   sources

Justin Wood; Patrick Tan; Wei Wang; Corey Arnold

arXiv:1606.00577·cs.CL·May 19, 2017

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

Justin Wood, Patrick Tan, Wei Wang, Corey Arnold

PDF

1 Repo

TL;DR

This paper presents Source-LDA, a novel method that incorporates prior knowledge sources into probabilistic topic models to improve topic labeling and generation accuracy.

Contribution

It introduces a new approach to integrate labeled knowledge sources into LDA, enhancing topic labeling and model performance.

Findings

01

Accurate label assignment to topics

02

Improved topic generation quality

03

Enhanced alignment with prior knowledge

Abstract

A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these sets of words with a single n-gram. Such labels are useful for topic identification in summarization systems. This paper introduces a novel approach to labeling a group of n-grams comprising an individual topic. The approach taken is to complement the existing topic distributions over words with a known distribution based on a predefined set of topics. This is done by integrating existing labeled knowledge sources representing known potential topics into the probabilistic topic model. These knowledge sources are translated into a distribution and used to set the hyperparameters of the Dirichlet generated distribution over words. In the inference these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucla-scai/Source-LDA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.