Why Self-Attention? A Targeted Evaluation of Neural Machine Translation   Architectures

Gongbo Tang; Mathias M\"uller; Annette Rios; Rico Sennrich

arXiv:1808.08946·cs.CL·November 13, 2018·36 cites

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Gongbo Tang, Mathias M\"uller, Annette Rios, Rico Sennrich

PDF

Open Access 1 Repo

TL;DR

This paper empirically evaluates neural machine translation architectures, revealing that self-attention models excel in semantic feature extraction but do not outperform RNNs in modeling long-range dependencies like subject-verb agreement.

Contribution

It provides a targeted empirical comparison of RNNs, CNNs, and self-attention networks, challenging assumptions about their capabilities in long-range dependency modeling.

Findings

01

Self-attention networks do not outperform RNNs in long-distance subject-verb agreement.

02

Self-attention networks outperform RNNs and CNNs in word sense disambiguation.

03

CNNs and self-attention models excel at semantic feature extraction.

Abstract

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

awslabs/sockeye
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification