Centroid Transformers: Learning to Abstract with Attention

Lemeng Wu; Xingchao Liu; Qiang Liu

arXiv:2102.08606·cs.LG·March 9, 2021·20 cites

Centroid Transformers: Learning to Abstract with Attention

Lemeng Wu, Xingchao Liu, Qiang Liu

PDF

Open Access

TL;DR

Centroid Transformers introduce a novel attention mechanism that summarizes input features into fewer outputs, reducing computational complexity while maintaining effectiveness across tasks like text summarization and vision.

Contribution

This paper proposes centroid attention, a generalization of self-attention that summarizes inputs into fewer outputs, revealing a connection to clustering and improving efficiency.

Findings

01

Effective in abstractive text summarization

02

Reduces computation in vision tasks

03

Outperforms standard transformers in experiments

Abstract

Self-attention, as the key block of transformers, is a powerful mechanism for extracting features from the inputs. In essence, what self-attention does is to infer the pairwise relations between the elements of the inputs, and modify the inputs by propagating information between input pairs. As a result, it maps inputs to N outputs and casts a quadratic $O (N^{2})$ memory and time complexity. We propose centroid attention, a generalization of self-attention that maps N inputs to M outputs $(M \leq N)$ , such that the key information in the inputs are summarized in the smaller number of outputs (called centroids). We design centroid attention by amortizing the gradient descent update rule of a clustering objective function on the inputs, which reveals an underlying connection between attention and clustering. By compressing the inputs to the centroids, we extract the key information useful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis