Semantic-Augmented Latent Topic Modeling with LLM-in-the-Loop

Mengze Hong; Chen Jason Zhang; Di Jiang

arXiv:2507.08498·cs.CL·July 14, 2025

Semantic-Augmented Latent Topic Modeling with LLM-in-the-Loop

Mengze Hong, Chen Jason Zhang, Di Jiang

PDF

TL;DR

This paper investigates augmenting Latent Dirichlet Allocation with Large Language Models during initialization and post-correction, finding that while initialization benefits early iterations, post-correction improves topic coherence, challenging assumptions about LLM superiority.

Contribution

It introduces a novel LLM-in-the-loop framework for LDA, specifically integrating LLMs into initialization and post-correction phases, and evaluates their impact on topic modeling performance.

Findings

01

LLM-guided initialization improves early LDA iterations

02

LLM-enabled post-correction enhances topic coherence by 5.86%

03

Initialization with LLMs can worsen overall convergence performance

Abstract

Latent Dirichlet Allocation (LDA) is a prominent generative probabilistic model used for uncovering abstract topics within document collections. In this paper, we explore the effectiveness of augmenting topic models with Large Language Models (LLMs) through integration into two key phases: Initialization and Post-Correction. Since the LDA is highly dependent on the quality of its initialization, we conduct extensive experiments on the LLM-guided topic clustering for initializing the Gibbs sampling algorithm. Interestingly, the experimental results reveal that while the proposed initialization strategy improves the early iterations of LDA, it has no effect on the convergence and yields the worst performance compared to the baselines. The LLM-enabled post-correction, on the other hand, achieved a promising improvement of 5.86% in the coherence evaluation. These results highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.