Unveiling Transformers with LEGO: a synthetic reasoning task

Yi Zhang; Arturs Backurs; S\'ebastien Bubeck; Ronen Eldan; Suriya; Gunasekar; Tal Wagner

arXiv:2206.04301·cs.LG·February 21, 2023·5 cites

Unveiling Transformers with LEGO: a synthetic reasoning task

Yi Zhang, Arturs Backurs, S\'ebastien Bubeck, Ronen Eldan, Suriya, Gunasekar, Tal Wagner

PDF

Open Access 1 Repo

TL;DR

This paper introduces LEGO, a synthetic reasoning task for studying Transformer models, analyzing how data, architecture, and pretraining influence learning, and proposing new attention mechanisms to improve robustness and efficiency.

Contribution

The paper presents LEGO, a novel synthetic reasoning task, and investigates how Transformers learn it, revealing structured attention patterns and proposing a new LEGO attention module.

Findings

01

Transformers develop structured attention patterns, including a novel association pattern.

02

Pretraining on unrelated tasks can facilitate LEGO task learning through structured attention.

03

The LEGO attention module reduces computational cost and can improve performance.

Abstract

We propose a synthetic reasoning task, LEGO (Learning Equality and Group Operations), that encapsulates the problem of following a chain of reasoning, and we study how the Transformer architectures learn this task. We pay special attention to data effects such as pretraining (on seemingly unrelated NLP tasks) and dataset composition (e.g., differing chain length at training and test time), as well as architectural variants such as weight-tied layers or adding convolutional components. We study how the trained models eventually succeed at the task, and in particular, we manage to understand some of the attention heads as well as how the information flows in the network. In particular, we have identified a novel \emph{association} pattern that globally attends only to identical tokens. Based on these observations we propose a hypothesis that here pretraining helps for LEGO tasks due to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yizhangzzz/transformers-lego
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Neural Networks and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dropout