Context-Aware Self-Attention Networks

Baosong Yang; Jian Li; Derek Wong; Lidia S. Chao; Xing Wang; Zhaopeng; Tu

arXiv:1902.05766·cs.CL·February 18, 2019·28 cites

Context-Aware Self-Attention Networks

Baosong Yang, Jian Li, Derek Wong, Lidia S. Chao, Xing Wang, Zhaopeng, Tu

PDF

Open Access

TL;DR

This paper enhances self-attention networks by integrating rich contextual information into query and key transformations, improving translation performance without external resources.

Contribution

It introduces a method to incorporate global and deep context into self-attention, maintaining simplicity while boosting effectiveness in translation tasks.

Findings

01

Improved translation quality on WMT datasets

02

Effective utilization of internal context representations

03

Maintains model simplicity and flexibility

Abstract

Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which have proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculates the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English-German and WMT17 Chinese-English…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks