Graph Topic Modeling for Documents with Spatial or Covariate Dependencies
Yeo Jin Jung, Claire Donnat

TL;DR
This paper introduces a graph-regularized SVD approach to incorporate document metadata into topic modeling, improving efficiency and accuracy over traditional Bayesian methods.
Contribution
It extends pLSI with a graph-based regularization, providing theoretical error bounds and a fast inference algorithm for document-topic modeling with metadata.
Findings
Improved topic modeling accuracy on real datasets
Faster inference compared to Bayesian methods
Theoretical bounds on estimation error
Abstract
We address the challenge of incorporating document-level metadata into topic modeling to improve topic mixture estimation. To overcome the computational complexity and lack of theoretical guarantees in existing Bayesian methods, we extend probabilistic latent semantic indexing (pLSI), a frequentist framework for topic modeling, by incorporating document-level covariates or known similarities between documents through a graph formalism. Modeling documents as nodes and edges denoting similarities, we propose a new estimator based on a fast graph-regularized iterative singular value decomposition (SVD) that encourages similar documents to share similar topic mixture proportions. We characterize the estimation error of our proposed method by deriving high-probability bounds and develop a specialized cross-validation method to optimize our regularization parameters. We validate our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Advanced Graph Neural Networks
