Accounting for Agreement Phenomena in Sentence Comprehension with   Transformer Language Models: Effects of Similarity-based Interference on   Surprisal and Attention

Soo Hyun Ryu; Richard L. Lewis

arXiv:2104.12874·cs.CL·April 28, 2021

Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention

Soo Hyun Ryu, Richard L. Lewis

PDF

TL;DR

This paper investigates how Transformer language models like GPT-2 simulate agreement phenomena and interference effects in sentence comprehension, revealing that surprisal and attention patterns align with human reading behaviors and cue-based retrieval theories.

Contribution

It demonstrates that GPT-2's surprisal and attention patterns can model similarity-based interference effects in agreement processing, bridging computational modeling with psycholinguistic evidence.

Findings

01

Surprisal predicts facilitatory interference effects in ungrammatical sentences.

02

Attention patterns in GPT-2 show diffuse focus with similar distractors.

03

Model's learned cues align with cue-based retrieval in human parsing.

Abstract

We advance a novel explanation of similarity-based interference effects in subject-verb and reflexive pronoun agreement processing, grounded in surprisal values computed from a pretrained large-scale Transformer model, GPT-2. Specifically, we show that surprisal of the verb or reflexive pronoun predicts facilitatory interference effects in ungrammatical sentences, where a distractor noun that matches in number with the verb or pronoun leads to faster reading times, despite the distractor not participating in the agreement relation. We review the human empirical evidence for such effects, including recent meta-analyses and large-scale studies. We also show that attention patterns (indexed by entropy and other measures) in the Transformer show patterns of diffuse attention in the presence of similar distractors, consistent with cue-based retrieval models of parsing. But in contrast to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Residual Connection · Softmax · Attention Dropout