Improving Attention Mechanism with Query-Value Interaction
Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

TL;DR
This paper introduces a novel attention mechanism that incorporates query-value interactions, leading to improved performance in various NLP tasks by learning query-aware attention values.
Contribution
It proposes a new query-value interaction function that enhances existing attention mechanisms by learning query-specific values, which was not addressed in prior methods.
Findings
Consistent performance improvements across four NLP datasets.
Enhanced attention models with query-value interactions outperform baseline models.
The approach is effective across different tasks and models.
Abstract
Attention mechanism has played critical roles in various state-of-the-art NLP models such as Transformer and BERT. It can be formulated as a ternary function that maps the input queries, keys and values into an output by using a summation of values weighted by the attention weights derived from the interactions between queries and keys. Similar with query-key interactions, there is also inherent relatedness between queries and values, and incorporating query-value interactions has the potential to enhance the output by learning customized values according to the characteristics of queries. However, the query-value interactions are ignored by existing attention methods, which may be not optimal. In this paper, we propose to improve the existing attention mechanism by incorporating query-value interactions. We propose a query-value interaction function which can learn query-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Byte Pair Encoding · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay
