Context-Aware Clustering using Large Language Models
Sindhu Tipirneni, Ravinarayana Adkathimar, Nurendra Choudhary, Gaurush, Hiranandani, Rana Ali Amjad, Vassilis N. Ioannidis, Changhe Yuan, Chandan K., Reddy

TL;DR
This paper presents CACTUS, a novel context-aware clustering method using open-source LLMs with a new triplet loss and self-supervised tasks, enabling efficient supervised clustering of text entities with improved accuracy.
Contribution
The paper introduces CACTUS, a scalable approach leveraging open-source LLMs, a novel augmented triplet loss, and self-supervised techniques for supervised text clustering.
Findings
CACTUS outperforms existing clustering baselines on e-commerce datasets.
The method effectively transfers knowledge from closed-source to open-source LLMs.
Scalable and cost-effective clustering with improved accuracy.
Abstract
Despite the remarkable success of Large Language Models (LLMs) in text understanding and generation, their potential for text clustering tasks remains underexplored. We observed that powerful closed-source LLMs provide good quality clusterings of entity sets but are not scalable due to the massive compute power required and the associated costs. Thus, we propose CACTUS (Context-Aware ClusTering with aUgmented triplet losS), a systematic approach that leverages open-source LLMs for efficient and effective supervised clustering of entity subsets, particularly focusing on text-based entities. Existing text clustering methods fail to effectively capture the context provided by the entity subset. Moreover, though there are several language modeling based approaches for clustering, very few are designed for the task of supervised clustering. This paper introduces a novel approach towards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
MethodsTriplet Loss
