Value-aware Approximate Attention

Ankit Gupta; Jonathan Berant

arXiv:2103.09857·cs.LG·March 19, 2021

Value-aware Approximate Attention

Ankit Gupta, Jonathan Berant

PDF

1 Repo

TL;DR

This paper introduces a value-aware approach to approximate attention in Transformers, emphasizing the importance of incorporating value vectors for improved accuracy and proposing kernel choices that enhance sparse approximation quality.

Contribution

It presents a novel value-aware objective for attention approximation, demonstrating its superiority over value-ignoring methods in language modeling tasks.

Findings

01

Value-aware approximation outperforms traditional methods in language modeling.

02

Kernel functions with less skewness improve sparse approximation quality.

03

Theoretical and empirical evidence supports the importance of value vectors in attention approximation.

Abstract

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all approximations thus far have ignored the contribution of the $value vectors$ to the quality of approximation. In this work, we argue that research efforts should be directed towards approximating the true output of the attention sub-layer, which includes the value vectors. We propose a value-aware objective, and show theoretically and empirically that an optimal approximation of a value-aware objective substantially outperforms an optimal approximation that ignores values, in the context of language modeling. Moreover, we show that the choice of kernel function for computing attention similarity can substantially affect the quality of sparse approximations, where kernel functions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ag1988/value_aware_attn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's.