ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation
Yiming Xu, Yuan Yuan, Vijay Viswanathan, Graham Neubig

TL;DR
ClusterFusion introduces a hybrid clustering framework that leverages large language models as the core clustering mechanism guided by lightweight embeddings, enabling effective domain-specific text clustering without extensive fine-tuning.
Contribution
It presents a novel hybrid approach where LLMs serve as the main clustering engine guided by embeddings, enhancing domain adaptability and performance.
Findings
Achieves state-of-the-art results on standard benchmarks.
Demonstrates substantial improvements in domain-specific clustering.
Provides a new dataset for future research.
Abstract
Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without costly fine-tuning. Large language models (LLMs) provide strong contextual reasoning, yet prior work mainly uses them as auxiliary modules to refine embeddings or adjust cluster boundaries. We propose ClusterFusion, a hybrid framework that instead treats the LLM as the clustering core, guided by lightweight embedding methods. The framework proceeds in three stages: embedding-guided subset partition, LLM-driven topic summarization, and LLM-based topic assignment. This design enables direct incorporation of domain knowledge and user preferences, fully leveraging the contextual adaptability of LLMs. Experiments on three public benchmarks and two new domain-specific datasets demonstrate that ClusterFusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Sentiment Analysis and Opinion Mining
