Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN
Rahma Chaabouni, Roberto Dess\`i, Eugene Kharitonov

TL;DR
This paper investigates whether improvements in compositional generalization, inspired by SCAN tasks, transfer to real-world NLP tasks like machine translation, especially under low-resource and domain-shifted conditions.
Contribution
The study introduces modifications to Transformer models to enhance compositional generalization and evaluates their effectiveness across different translation scenarios.
Findings
Transformers with SCAN-inspired modifications improve low-resource translation performance.
Improvements do not significantly transfer to resource-rich machine translation.
Enhanced models show up to 13.1% BLEU score increase in low-resource settings.
Abstract
Despite their practical success, modern seq2seq architectures are unable to generalize systematically on several SCAN tasks. Hence, it is not clear if SCAN-style compositional generalization is useful in realistic NLP tasks. In this work, we study the benefit that such compositionality brings about to several machine translation tasks. We present several focused modifications of Transformer that greatly improve generalization capabilities on SCAN and select one that remains on par with a vanilla Transformer on a standard machine translation (MT) task. Next, we study its performance in low-resource settings and on a newly introduced distribution-shifted English-French translation task. Overall, we find that improvements of a SCAN-capable model do not directly transfer to the resource-rich MT setup. In contrast, in the low-resource setup, general modifications lead to an improvement of up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Genomics and Phylogenetic Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Tanh Activation · Sigmoid Activation · Adam · Layer Normalization · Byte Pair Encoding
