Centroid Transformers: Learning to Abstract with Attention
Lemeng Wu, Xingchao Liu, Qiang Liu

TL;DR
Centroid Transformers introduce a novel attention mechanism that summarizes input features into fewer outputs, reducing computational complexity while maintaining effectiveness across tasks like text summarization and vision.
Contribution
This paper proposes centroid attention, a generalization of self-attention that summarizes inputs into fewer outputs, revealing a connection to clustering and improving efficiency.
Findings
Effective in abstractive text summarization
Reduces computation in vision tasks
Outperforms standard transformers in experiments
Abstract
Self-attention, as the key block of transformers, is a powerful mechanism for extracting features from the inputs. In essence, what self-attention does is to infer the pairwise relations between the elements of the inputs, and modify the inputs by propagating information between input pairs. As a result, it maps inputs to N outputs and casts a quadratic memory and time complexity. We propose centroid attention, a generalization of self-attention that maps N inputs to M outputs , such that the key information in the inputs are summarized in the smaller number of outputs (called centroids). We design centroid attention by amortizing the gradient descent update rule of a clustering objective function on the inputs, which reveals an underlying connection between attention and clustering. By compressing the inputs to the centroids, we extract the key information useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
