Clustering by Attention: Leveraging Prior Fitted Transformers for Data Partitioning
Ahmed Shokry, Ayman Khalafallah

TL;DR
This paper introduces a novel, parameter-free clustering method using a pre-trained transformer that leverages a few pre-clustered samples to accurately partition large datasets, outperforming existing techniques.
Contribution
The paper proposes a new clustering approach based on meta-learning with a pre-trained transformer, eliminating parameter tuning and improving accuracy with minimal pre-clustered samples.
Findings
Outperforms state-of-the-art clustering methods.
Works effectively with few pre-clustered samples.
Scales well to large datasets.
Abstract
Clustering is a core task in machine learning with wide-ranging applications in data mining and pattern recognition. However, its unsupervised nature makes it inherently challenging. Many existing clustering algorithms suffer from critical limitations: they often require careful parameter tuning, exhibit high computational complexity, lack interpretability, or yield suboptimal accuracy, especially when applied to large-scale datasets. In this paper, we introduce a novel clustering approach based on meta-learning. Our approach eliminates the need for parameter optimization while achieving accuracy that outperforms state-of-the-art clustering techniques. The proposed technique leverages a few pre-clustered samples to guide the clustering process for the entire dataset in a single forward pass. Specifically, we employ a pre-trained Prior-Data Fitted Transformer Network (PFN) to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
