Attention-based clustering
Rodrigo Maulen-Soto (SU, LPSM), Pierre Marion (EPFL), Claire Boyer (UPS, IUF)

TL;DR
This paper provides a theoretical analysis of transformers' ability to perform unsupervised clustering and in-context quantization, demonstrating their capacity to extract and adapt to data structure from Gaussian mixture models.
Contribution
It introduces a theoretical framework showing how attention layers can align with true data structures and perform in-context quantization without training.
Findings
Attention layers can align with Gaussian mixture centroids.
Unsupervised risk minimization drives parameters to true structure.
Identity-attention layers can perform in-context quantization.
Abstract
Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids. This phenomenon highlights the ability of attention-based layers to capture underlying distributional structure. We further examine an attention layer with key, query, and value matrices fixed to the identity, and show that, even without any trainable parameters, it can perform in-context quantization, revealing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
MethodsSoftmax · Attention Is All You Need · ALIGN
