
TL;DR
This paper introduces Exclusive Topic Modeling (ETM), a novel unsupervised text classification method that effectively identifies field-specific keywords and produces well-structured, exclusive topics by using specialized penalties.
Contribution
The paper presents a new ETM approach combining weighted Lasso and Kullback-Leibler divergence penalties to improve topic separation and keyword relevance over traditional methods like LDA.
Findings
ETM detects field-specific keywords better than LDA.
Topic coherence score improves by 22% with weighted Lasso.
Topic coherence score improves by 10% with KL divergence.
Abstract
We propose an Exclusive Topic Modeling (ETM) for unsupervised text classification, which is able to 1) identify the field-specific keywords though less frequently appeared and 2) deliver well-structured topics with exclusive words. In particular, a weighted Lasso penalty is imposed to reduce the dominance of the frequently appearing yet less relevant words automatically, and a pairwise Kullback-Leibler divergence penalty is used to implement topics separation. Simulation studies demonstrate that the ETM detects the field-specific keywords, while LDA fails. When applying to the benchmark NIPS dataset, the topic coherence score on average improves by 22% and 10% for the model with weighted Lasso penalty and pairwise Kullback-Leibler divergence penalty, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Advanced Text Analysis Techniques
MethodsLinear Discriminant Analysis
