Improving Attention Mechanism with Query-Value Interaction

Chuhan Wu; Fangzhao Wu; Tao Qi; Yongfeng Huang

arXiv:2010.03766·cs.CL·October 9, 2020·5 cites

Improving Attention Mechanism with Query-Value Interaction

Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

PDF

Open Access

TL;DR

This paper introduces a novel attention mechanism that incorporates query-value interactions, leading to improved performance in various NLP tasks by learning query-aware attention values.

Contribution

It proposes a new query-value interaction function that enhances existing attention mechanisms by learning query-specific values, which was not addressed in prior methods.

Findings

01

Consistent performance improvements across four NLP datasets.

02

Enhanced attention models with query-value interactions outperform baseline models.

03

The approach is effective across different tasks and models.

Abstract

Attention mechanism has played critical roles in various state-of-the-art NLP models such as Transformer and BERT. It can be formulated as a ternary function that maps the input queries, keys and values into an output by using a summation of values weighted by the attention weights derived from the interactions between queries and keys. Similar with query-key interactions, there is also inherent relatedness between queries and values, and incorporating query-value interactions has the potential to enhance the output by learning customized values according to the characteristics of queries. However, the query-value interactions are ignored by existing attention methods, which may be not optimal. In this paper, we propose to improve the existing attention mechanism by incorporating query-value interactions. We propose a query-value interaction function which can learn query-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Byte Pair Encoding · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay