Making Transformers Solve Compositional Tasks

Santiago Onta\~n\'on; Joshua Ainslie; Vaclav Cvicek; Zachary Fisher

arXiv:2108.04378·cs.AI·March 4, 2022

Making Transformers Solve Compositional Tasks

Santiago Onta\~n\'on, Joshua Ainslie, Vaclav Cvicek, Zachary Fisher

PDF

Open Access 1 Repo

TL;DR

This paper investigates how different Transformer design choices affect their ability to generalize compositionally in NLP tasks, leading to improved configurations that outperform previous models on key benchmarks.

Contribution

The study systematically explores Transformer design space, identifying configurations that enhance compositional generalization and achieve state-of-the-art results on multiple benchmarks.

Findings

01

Identified Transformer configurations with significantly improved compositional generalization.

02

Achieved state-of-the-art results on COGS and PCFG benchmarks.

03

Demonstrated the impact of inductive biases on model generalization.

Abstract

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Adam