Causal Transformers Perform Below Chance on Recursive Nested   Constructions, Unlike Humans

Yair Lakretz; Th\'eo Desbordes; Dieuwke Hupkes; Stanislas Dehaene

arXiv:2110.07240·cs.CL·October 15, 2021

Causal Transformers Perform Below Chance on Recursive Nested Constructions, Unlike Humans

Yair Lakretz, Th\'eo Desbordes, Dieuwke Hupkes, Stanislas Dehaene

PDF

Open Access

TL;DR

This study evaluates how well state-of-the-art Transformer language models handle recursive nested constructions, revealing they excel on short dependencies but fail on long-range recursive structures, unlike humans.

Contribution

The paper demonstrates that Transformer LMs perform well on short-range dependencies but fail on long-range recursive structures, highlighting a key limitation in their linguistic processing capabilities.

Findings

01

Transformers perform near-perfect on short-range embedded dependencies.

02

Performance drops below chance on long-range embedded dependencies.

03

Adding three words to the dependency causes a sharp performance decline.

Abstract

Recursive processing is considered a hallmark of human linguistic abilities. A recent study evaluated recursive processing in recurrent neural language models (RNN-LMs) and showed that such models perform below chance level on embedded dependencies within nested constructions -- a prototypical example of recursion in natural language. Here, we study if state-of-the-art Transformer LMs do any better. We test four different Transformer LMs on two different types of nested constructions, which differ in whether the embedded (inner) dependency is short or long range. We find that Transformers achieve near-perfect performance on short-range embedded dependencies, significantly better than previous results reported for RNN-LMs and humans. However, on long-range embedded dependencies, Transformers' performance sharply drops below chance level. Remarkably, the addition of only three words to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Neurobiology of Language and Bilingualism

MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Absolute Position Encodings · Softmax · Residual Connection · Adam · Label Smoothing · Byte Pair Encoding