Text Clustering as Classification with LLMs
Chen Huang, Guoxiu He

TL;DR
This paper introduces a novel LLM-based framework that reformulates text clustering as a classification task, significantly reducing computational costs while maintaining or improving clustering performance across various datasets.
Contribution
The paper presents a new approach that leverages in-context learning of LLMs to perform text clustering without fine-tuning or complex similarity metrics.
Findings
Achieves comparable or better performance than state-of-the-art methods.
Reduces computational complexity and resource requirements.
Demonstrates effectiveness across diverse datasets.
Abstract
Text clustering serves as a fundamental technique for organizing and interpreting unstructured textual data, particularly in contexts where manual annotation is prohibitively costly. With the rapid advancement of Large Language Models (LLMs) and their demonstrated effectiveness across a broad spectrum of NLP tasks, an emerging body of research has begun to explore their potential in the domain of text clustering. However, existing LLM-based approaches still rely on fine-tuned embedding models and sophisticated similarity metrics, rendering them computationally intensive and necessitating domain-specific adaptation. To address these limitations, we propose a novel framework that reframes text clustering as a classification task by harnessing the in-context learning capabilities of LLMs. Our framework eliminates the need for fine-tuning embedding models or intricate clustering algorithms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
