Attention-based clustering

Rodrigo Maulen-Soto (SU; LPSM); Pierre Marion (EPFL); Claire Boyer (UPS; IUF)

arXiv:2505.13112·stat.ML·October 29, 2025

Attention-based clustering

Rodrigo Maulen-Soto (SU, LPSM), Pierre Marion (EPFL), Claire Boyer (UPS, IUF)

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of transformers' ability to perform unsupervised clustering and in-context quantization, demonstrating their capacity to extract and adapt to data structure from Gaussian mixture models.

Contribution

It introduces a theoretical framework showing how attention layers can align with true data structures and perform in-context quantization without training.

Findings

01

Attention layers can align with Gaussian mixture centroids.

02

Unsupervised risk minimization drives parameters to true structure.

03

Identity-attention layers can perform in-context quantization.

Abstract

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids. This phenomenon highlights the ability of attention-based layers to capture underlying distributional structure. We further examine an attention layer with key, query, and value matrices fixed to the identity, and show that, even without any trainable parameters, it can perform in-context quantization, revealing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research

MethodsSoftmax · Attention Is All You Need · ALIGN