AttentionX: Exploiting Consensus Discrepancy In Attention from A Distributed Optimization Perspective
Guoqiang Zhang, Richard Heusdens

TL;DR
AttentionX introduces a novel way to enhance transformer attention by integrating consensus discrepancy inspired by distributed optimization, leading to improved performance on vision and language models.
Contribution
It extends standard Attention by incorporating consensus discrepancy inspired by PDMM, a distributed optimization method, to improve transformer performance.
Findings
Promising results on ViT and nanoGPT benchmarks.
Enhanced attention mechanism improves model convergence.
Demonstrates the effectiveness of distributed optimization concepts in transformers.
Abstract
In this paper, we extend the standard Attention in transformer by exploiting the consensus discrepancy from a distributed optimization perspective, referred to as AttentionX. It is noted that the primal-dual method of multipliers (PDMM) \cite{Zhang16PDMM} is designed to iteratively solve a broad class of distributed optimization problems over a pear-to-pear (P2P) network, where neighbouring nodes gradually reach consensus as specified by predefined linear edge-constraints in the optimization process. In particular, at each iteration of PDMM, each node in a network first performs information-gathering from neighbours and then performs local information-fusion. From a high-level point of view, the -softmax-based weighted summation of -representations in Attention corresponds information-gathering from neighbours while the feature-processing via the feed-forward network (FFN) in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)
MethodsSoftmax · Attention Is All You Need
