Tensor Topic Modeling Via HOSVD
Yating Liu, Claire Donnat

TL;DR
This paper introduces a tensor-based topic modeling method using HOSVD, which effectively captures multi-dimensional data structures and interactions, outperforming traditional models in synthetic and real datasets.
Contribution
It proposes a novel tensor decomposition approach for topic modeling that incorporates complex data structures and interactions, extending beyond traditional probabilistic models.
Findings
Successfully recovers lower-rank structures in synthetic data
Demonstrates improved pattern capturing in multi-dimensional datasets
Performs well on research abstracts and microbiome data
Abstract
By representing documents as mixtures of topics, topic modeling has allowed the successful analysis of datasets across a wide spectrum of applications ranging from ecology to genetics. An important body of recent work has demonstrated the computational and statistical efficiency of probabilistic Latent Semantic Indexing (pLSI)-- a type of topic modeling -- in estimating both the topic matrix (corresponding to distributions over word frequencies), and the topic assignment matrix. However, these methods are not easily extendable to the incorporation of additional temporal, spatial, or document-specific information, thereby potentially neglecting useful information in the analysis of spatial or longitudinal datasets that can be represented as tensors. Consequently, in this paper, we propose using a modified higher-order singular value decomposition (HOSVD) to estimate topic models based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
