A new LDA formulation with covariates

Gilson Shimizu; Rafael Izbicki; Denis Valle

arXiv:2202.11527·cs.IR·February 24, 2022

A new LDA formulation with covariates

Gilson Shimizu, Rafael Izbicki, Denis Valle

PDF

Open Access

TL;DR

This paper introduces a novel LDA model incorporating covariates via negative binomial regression, enabling better interpretation and analysis of cluster abundance in diverse applications, with demonstrated effectiveness through simulations and real data examples.

Contribution

The paper presents a new LDA formulation that integrates covariates directly into the model, allowing for straightforward interpretation and analysis of cluster abundance.

Findings

01

Successful parameter recovery in simulations

02

Effective prediction of abundance matrices using covariates

03

Versatile application across text, shopping, and ecological data

Abstract

The Latent Dirichlet Allocation (LDA) model is a popular method for creating mixed-membership clusters. Despite having been originally developed for text analysis, LDA has been used for a wide range of other applications. We propose a new formulation for the LDA model which incorporates covariates. In this model, a negative binomial regression is embedded within LDA, enabling straight-forward interpretation of the regression coefficients and the analysis of the quantity of cluster-specific elements in each sampling units (instead of the analysis being focused on modeling the proportion of each cluster, as in Structural Topic Models). We use slice sampling within a Gibbs sampling algorithm to estimate model parameters. We rely on simulations to show how our algorithm is able to successfully retrieve the true parameter values and the ability to make predictions for the abundance matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Data Analysis with R

MethodsLinear Discriminant Analysis