Semantic-Augmented Latent Topic Modeling with LLM-in-the-Loop
Mengze Hong, Chen Jason Zhang, Di Jiang

TL;DR
This paper investigates augmenting Latent Dirichlet Allocation with Large Language Models during initialization and post-correction, finding that while initialization benefits early iterations, post-correction improves topic coherence, challenging assumptions about LLM superiority.
Contribution
It introduces a novel LLM-in-the-loop framework for LDA, specifically integrating LLMs into initialization and post-correction phases, and evaluates their impact on topic modeling performance.
Findings
LLM-guided initialization improves early LDA iterations
LLM-enabled post-correction enhances topic coherence by 5.86%
Initialization with LLMs can worsen overall convergence performance
Abstract
Latent Dirichlet Allocation (LDA) is a prominent generative probabilistic model used for uncovering abstract topics within document collections. In this paper, we explore the effectiveness of augmenting topic models with Large Language Models (LLMs) through integration into two key phases: Initialization and Post-Correction. Since the LDA is highly dependent on the quality of its initialization, we conduct extensive experiments on the LLM-guided topic clustering for initializing the Gibbs sampling algorithm. Interestingly, the experimental results reveal that while the proposed initialization strategy improves the early iterations of LDA, it has no effect on the convergence and yields the worst performance compared to the baselines. The LLM-enabled post-correction, on the other hand, achieved a promising improvement of 5.86% in the coherence evaluation. These results highlight the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
