Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better   than Dot-Product Self-Attention

Tong Yu; Ruslan Khalitov; Lei Cheng; Zhirong Yang

arXiv:2204.10670·cs.LG·April 25, 2022

Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention

Tong Yu, Ruslan Khalitov, Lei Cheng, Zhirong Yang

PDF

Open Access 2 Repos

TL;DR

Paramixer introduces a scalable, full-rank mixing block that outperforms traditional dot-product self-attention by reducing computational complexity to O(N log N) and avoiding low-rank limitations.

Contribution

The paper proposes Paramixer, a novel sparse matrix factorization approach with MLP-parameterized entries, improving efficiency and effectiveness over standard self-attention methods.

Findings

01

Paramixer achieves better performance on synthetic and real-world datasets.

02

It reduces computational cost to O(N log N) compared to O(N^2).

03

All factorizing matrices are full-rank, avoiding low-rank bottlenecks.

Abstract

Self-Attention is a widely used building block in neural modeling to mix long-range data elements. Most self-attention neural networks employ pairwise dot-products to specify the attention coefficients. However, these methods require $O (N^{2})$ computing cost for sequence length $N$ . Even though some approximation methods have been introduced to relieve the quadratic cost, the performance of the dot-product approach is still bottlenecked by the low-rank constraint in the attention matrix factorization. In this paper, we propose a novel scalable and effective mixing building block called Paramixer. Our method factorizes the interaction matrix into several sparse matrices, where we parameterize the non-zero entries by MLPs with the data elements as input. The overall computing cost of the new building block is as low as $O (N lo g N)$ . Moreover, all factorizing matrices in Paramixer are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Advanced Neural Network Applications