The Polylingual Labeled Topic Model
Lisa Posch, Arnim Bleier, Philipp Schaer, Markus Strohmaier

TL;DR
The Polylingual Labeled Topic Model integrates multilingual and label-constrained topic modeling, improving interpretability and performance on social science data by handling multiple languages with predefined labels.
Contribution
It introduces a novel model combining polylingual and labeled topic modeling, demonstrating superior performance and interpretability over existing models.
Findings
Outperforms LDA and Labeled LDA in perplexity
Produces semantically coherent, human-interpretable topics
Effective in a two-language social science dataset
Abstract
In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions for each language while restricting the permitted topics of a document to a set of predefined labels. We explore the properties of the model in a two-language setting on a dataset from the social science domain. Our experiments show that our model outperforms LDA and Labeled LDA in terms of their held-out perplexity and that it produces semantically coherent topics which are well interpretable by human subjects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Discriminant Analysis
