A new LDA formulation with covariates
Gilson Shimizu, Rafael Izbicki, Denis Valle

TL;DR
This paper introduces a novel LDA model incorporating covariates via negative binomial regression, enabling better interpretation and analysis of cluster abundance in diverse applications, with demonstrated effectiveness through simulations and real data examples.
Contribution
The paper presents a new LDA formulation that integrates covariates directly into the model, allowing for straightforward interpretation and analysis of cluster abundance.
Findings
Successful parameter recovery in simulations
Effective prediction of abundance matrices using covariates
Versatile application across text, shopping, and ecological data
Abstract
The Latent Dirichlet Allocation (LDA) model is a popular method for creating mixed-membership clusters. Despite having been originally developed for text analysis, LDA has been used for a wide range of other applications. We propose a new formulation for the LDA model which incorporates covariates. In this model, a negative binomial regression is embedded within LDA, enabling straight-forward interpretation of the regression coefficients and the analysis of the quantity of cluster-specific elements in each sampling units (instead of the analysis being focused on modeling the proportion of each cluster, as in Structural Topic Models). We use slice sampling within a Gibbs sampling algorithm to estimate model parameters. We rely on simulations to show how our algorithm is able to successfully retrieve the true parameter values and the ability to make predictions for the abundance matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Data Analysis with R
MethodsLinear Discriminant Analysis
