Topic Analysis for Text with Side Data

Biyi Fang; Kripa Rajshekhar; Diego Klabjan

arXiv:2203.00762·cs.LG·March 3, 2022

Topic Analysis for Text with Side Data

Biyi Fang, Kripa Rajshekhar, Diego Klabjan

PDF

Open Access

TL;DR

This paper introduces a hybrid probabilistic model combining neural networks and hierarchical Bayesian methods to improve topic analysis in text, leveraging side data to enhance interpretability and predictive performance.

Contribution

It presents a novel four-level hierarchical Bayesian model that integrates neural networks with topic modeling, effectively utilizing side data for better topic grouping and prediction.

Findings

01

Outperforms standard LDA and DMR in topic grouping

02

Achieves lower model perplexity

03

Enhances classification and comment generation

Abstract

Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with side data to tackle these limitations. We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a four-level hierarchical Bayesian model. In the model, each document is modeled as a finite mixture over an underlying set of topics and each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Furthermore, each topic probability is modeled as a finite mixture over side data. In the context of text, the neural network provides an overview distribution about side data for the corresponding text, which is the prior distribution in LDA to help perform topic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Text and Document Classification Technologies

MethodsLinear Discriminant Analysis