The Polylingual Labeled Topic Model

Lisa Posch; Arnim Bleier; Philipp Schaer; Markus Strohmaier

arXiv:1507.06829·cs.CL·May 3, 2017

The Polylingual Labeled Topic Model

Lisa Posch, Arnim Bleier, Philipp Schaer, Markus Strohmaier

PDF

TL;DR

The Polylingual Labeled Topic Model integrates multilingual and label-constrained topic modeling, improving interpretability and performance on social science data by handling multiple languages with predefined labels.

Contribution

It introduces a novel model combining polylingual and labeled topic modeling, demonstrating superior performance and interpretability over existing models.

Findings

01

Outperforms LDA and Labeled LDA in perplexity

02

Produces semantically coherent, human-interpretable topics

03

Effective in a two-language social science dataset

Abstract

In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions for each language while restricting the permitted topics of a document to a set of predefined labels. We explore the properties of the model in a two-language setting on a dataset from the social science domain. Our experiments show that our model outperforms LDA and Labeled LDA in terms of their held-out perplexity and that it produces semantically coherent topics which are well interpretable by human subjects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis