Topic Analysis for Text with Side Data
Biyi Fang, Kripa Rajshekhar, Diego Klabjan

TL;DR
This paper introduces a hybrid probabilistic model combining neural networks and hierarchical Bayesian methods to improve topic analysis in text, leveraging side data to enhance interpretability and predictive performance.
Contribution
It presents a novel four-level hierarchical Bayesian model that integrates neural networks with topic modeling, effectively utilizing side data for better topic grouping and prediction.
Findings
Outperforms standard LDA and DMR in topic grouping
Achieves lower model perplexity
Enhances classification and comment generation
Abstract
Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with side data to tackle these limitations. We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a four-level hierarchical Bayesian model. In the model, each document is modeled as a finite mixture over an underlying set of topics and each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Furthermore, each topic probability is modeled as a finite mixture over side data. In the context of text, the neural network provides an overview distribution about side data for the corresponding text, which is the prior distribution in LDA to help perform topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Text and Document Classification Technologies
MethodsLinear Discriminant Analysis
