Neural Topic Modeling with Large Language Models in the Loop

Xiaohao Yang; He Zhao; Weijie Xu; Yuanyuan Qi; Jueqing Lu; Dinh Phung; Lan Du

arXiv:2411.08534·cs.CL·June 3, 2025

Neural Topic Modeling with Large Language Models in the Loop

Xiaohao Yang, He Zhao, Weijie Xu, Yuanyuan Qi, Jueqing Lu, Dinh Phung, Lan Du

PDF

Open Access 1 Repo 1 Video 4 Reviews

TL;DR

This paper introduces LLM-ITL, a novel framework combining Large Language Models with Neural Topic Models to improve topic interpretability and coverage while maintaining efficiency.

Contribution

The paper proposes a flexible LLM-in-the-loop framework that enhances neural topic models with LLM-based refinement using an Optimal Transport alignment, improving interpretability.

Findings

01

Significant improvement in topic interpretability.

02

Maintains high quality of document representations.

03

Framework is adaptable to various neural topic models.

Abstract

Topic modeling is a fundamental task in natural language processing, allowing the discovery of latent thematic structures in text corpora. While Large Language Models (LLMs) have demonstrated promising capabilities in topic discovery, their direct application to topic modeling suffers from issues such as incomplete topic coverage, misalignment of topics, and inefficiency. To address these limitations, we propose LLM-ITL, a novel LLM-in-the-loop framework that integrates LLMs with Neural Topic Models (NTMs). In LLM-ITL, global topics and document representations are learned through the NTM. Meanwhile, an LLM refines these topics using an Optimal Transport (OT)-based alignment objective, where the refinement is dynamically adjusted based on the LLM's confidence in suggesting topical words for each set of input words. With the flexibility of being integrated into many existing NTMs, the…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 5

Strengths

- The paper proposes an approach to enhance topic quality through topic refinement, leveraging the capabilities of LLMs. - While the approach represents progress in utilizing LLMs, it lacks substantial originality and novelty. - The paper is clear and well-written; however, certain claims—such as improvements in document representation quality—are not clearly supported by the architecture or methodology. - The experimental section is robust, utilizing four datasets and two evaluation metrics to

Weaknesses

- The paper takes an incremental approach, building on prior work to enhance topic models using external knowledge or language models. Specifically, it extends Neural Topic Models (NTMs) through a topic refinement approach leveraging large language models (LLMs). - The paper offers limited novelty in advancing the field, as previous studies have already improved baseline NTMs for topic modeling, extraction, and document representation. - The paper claims to improve "document representation quali

Reviewer 02Rating 3Confidence 5

Strengths

1. Incorporating LLMs in topic modeling is a relatively unexplored, challenging, and important direction. 2. The paper is easy to read and understand. The authors have done a good job to present their work.

Weaknesses

1. **Method:** Although this is a promising direction, I believe there are significant shortcomings with the method. From what I understand, LLMs can suggest out-of-vocabulary words, which means there is a risk of generating topics that do not represent the corpus at all and are simply highly coherent due to the LLM. I think this should be evaluated. One major issue is that the authors have used Wikipedia as a reference corpus to compute NPMI (which is generally acceptable), but in this case, NP

Reviewer 03Rating 3Confidence 4

Strengths

1. With the modeling power of LLMs, their applications on topic modeling are also emerging. This paper proposes to use LLMs to help topic modeling, which is timingly significant. 2. The overall writing of the paper is clear with figures as visual illustration and a formal algorithm to present the learning process. Details of experimental setup are also provided. 3. Experiments are comprehensive with different evaluation tasks. Ablation analysis is also conducted to show the effect of each mo

Weaknesses

Though this paper proposes an interesting approach, there are some unacceptable shortcomings. 1. One of the contribuitons mentioned in the Introduction section is scalability, which is evaluated by the number of parameters and running time in the Computational Costs section. However, when we talk about scalability, both empirical and theoretical analyses are important, but there is a lack of computational complexity analysis in the paper, which limits the contribution proposed in the paper. 2.

Reviewer 04Rating 3Confidence 5

Strengths

- The presentation is generally easy to follow and the content is concise. - Unlike most existing LLM-based approaches that rely on document-level LLM analysis, the proposed LLM-ITL uses LLMs at the word level, which is novel to me.

Weaknesses

- In the area of neural topic modeling, an important and strong existing model is ECRTM (Wu et al., 2023), but this submission overlooks the above study. The other strong baseline is WeTe (Wang et al., 2022), and I suggest that the authors introduce ECRTM and WeTe as baselines. Besides, as mentioned in (Wu et al., 2023), the coherence value of $C_V$ has been empirically shown to outperform the traditional metrics such as NPMI. Thus, I suggest that the authors improve their descriptions on the se

Code & Models

Repositories

Xiaohao-Yang/LLM-ITL
pytorch

Videos

Neural Topic Modeling with Large Language Models in the Loop· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods

MethodsSparse Evolutionary Training