Optimized Algorithms for Text Clustering with LLM-Generated Constraints
Chaoqi Jia, Weihong Wu, Longkun Guo, Zhigang Lu, Chao Chen, Kok-Leong Ong

TL;DR
This paper presents a novel constraint-generation method using large language models to improve text clustering accuracy efficiently, reducing resource consumption and maintaining high performance.
Contribution
It introduces a new approach that generates constraint sets instead of pairwise constraints, enhancing query efficiency and accuracy in LLM-based text clustering.
Findings
Achieves comparable clustering accuracy to state-of-the-art methods.
Reduces LLM query costs by over 20 times.
Effectively handles potentially inaccurate constraints with confidence thresholds.
Abstract
Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have incorporated background knowledge, typically in the form of must-link and cannot-link constraints, to guide the clustering process. With the recent advent of large language models (LLMs), there is growing interest in improving clustering quality through LLM-based automatic constraint generation. In this paper, we propose a novel constraint-generation approach that reduces resource consumption by generating constraint sets rather than using traditional pairwise constraints. This approach improves both query efficiency and constraint accuracy compared to state-of-the-art methods. We further introduce a constrained clustering algorithm tailored to the characteristics of LLM-generated constraints. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Natural Language Processing Techniques · Information Retrieval and Search Behavior
