Optimized Algorithms for Text Clustering with LLM-Generated Constraints

Chaoqi Jia; Weihong Wu; Longkun Guo; Zhigang Lu; Chao Chen; Kok-Leong Ong

arXiv:2601.11118·cs.LG·January 19, 2026

Optimized Algorithms for Text Clustering with LLM-Generated Constraints

Chaoqi Jia, Weihong Wu, Longkun Guo, Zhigang Lu, Chao Chen, Kok-Leong Ong

PDF

Open Access 1 Video

TL;DR

This paper presents a novel constraint-generation method using large language models to improve text clustering accuracy efficiently, reducing resource consumption and maintaining high performance.

Contribution

It introduces a new approach that generates constraint sets instead of pairwise constraints, enhancing query efficiency and accuracy in LLM-based text clustering.

Findings

01

Achieves comparable clustering accuracy to state-of-the-art methods.

02

Reduces LLM query costs by over 20 times.

03

Effectively handles potentially inaccurate constraints with confidence thresholds.

Abstract

Clustering is a fundamental tool that has garnered significant interest across a wide range of applications including text analysis. To improve clustering accuracy, many researchers have incorporated background knowledge, typically in the form of must-link and cannot-link constraints, to guide the clustering process. With the recent advent of large language models (LLMs), there is growing interest in improving clustering quality through LLM-based automatic constraint generation. In this paper, we propose a novel constraint-generation approach that reduces resource consumption by generating constraint sets rather than using traditional pairwise constraints. This approach improves both query efficiency and constraint accuracy compared to state-of-the-art methods. We further introduce a constrained clustering algorithm tailored to the characteristics of LLM-generated constraints. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimized Algorithms for Text Clustering with LLM-Generated Constraints· underline

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Natural Language Processing Techniques · Information Retrieval and Search Behavior