Simplicity Bias in Transformers and their Ability to Learn Sparse   Boolean Functions

Satwik Bhattamishra; Arkil Patel; Varun Kanade; Phil Blunsom

arXiv:2211.12316·cs.LG·July 11, 2023

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom

PDF

Open Access 1 Repo

TL;DR

This paper investigates the inductive biases of Transformers in learning Boolean functions, revealing their preference for low sensitivity functions and superior generalization on sparse Boolean tasks compared to LSTMs.

Contribution

It provides the first extensive empirical analysis of Transformers' bias towards low sensitivity functions and their ability to generalize on sparse Boolean functions.

Findings

01

Transformers are biased towards low sensitivity functions.

02

Both Transformers and LSTMs prefer low sensitivity functions during training.

03

Transformers generalize well on sparse Boolean functions even with noisy labels.

Abstract

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

satwik77/transformer-simplicity
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection