Prompting Large Language Models for Topic Modeling
Han Wang, Nirmalendu Prakash, Nguyen Khoi Hoang, Ming Shan Hee, Usman, Naseem, Roy Ka-Wei Lee

TL;DR
PromptTopic leverages large language models to improve topic modeling, especially for short texts, by extracting sentence-level topics and aggregating them, resulting in more coherent and meaningful themes without manual tuning.
Contribution
This paper introduces PromptTopic, a novel LLM-based method that enhances topic modeling for short texts and reduces manual parameter tuning.
Findings
Outperforms state-of-the-art baselines on diverse datasets
Produces more coherent and relevant topics
Effective across texts of varying lengths
Abstract
Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) to address these challenges. It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. This approach eliminates the need for manual parameter tuning and improves the quality of extracted topics. We benchmark PromptTopic against the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Expert finding and Q&A systems
