Can Transformers Jump Around Right in Natural Language? Assessing   Performance Transfer from SCAN

Rahma Chaabouni; Roberto Dess\`i; Eugene Kharitonov

arXiv:2107.01366·cs.CL·September 17, 2021

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Rahma Chaabouni, Roberto Dess\`i, Eugene Kharitonov

PDF

Open Access

TL;DR

This paper investigates whether improvements in compositional generalization, inspired by SCAN tasks, transfer to real-world NLP tasks like machine translation, especially under low-resource and domain-shifted conditions.

Contribution

The study introduces modifications to Transformer models to enhance compositional generalization and evaluates their effectiveness across different translation scenarios.

Findings

01

Transformers with SCAN-inspired modifications improve low-resource translation performance.

02

Improvements do not significantly transfer to resource-rich machine translation.

03

Enhanced models show up to 13.1% BLEU score increase in low-resource settings.

Abstract

Despite their practical success, modern seq2seq architectures are unable to generalize systematically on several SCAN tasks. Hence, it is not clear if SCAN-style compositional generalization is useful in realistic NLP tasks. In this work, we study the benefit that such compositionality brings about to several machine translation tasks. We present several focused modifications of Transformer that greatly improve generalization capabilities on SCAN and select one that remains on par with a vanilla Transformer on a standard machine translation (MT) task. Next, we study its performance in low-resource settings and on a newly introduced distribution-shifted English-French translation task. Overall, we find that improvements of a SCAN-capable model do not directly transfer to the resource-rich MT setup. In contrast, in the low-resource setup, general modifications lead to an improvement of up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Genomics and Phylogenetic Studies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Tanh Activation · Sigmoid Activation · Adam · Layer Normalization · Byte Pair Encoding