How Language Models Prioritize Contextual Grammatical Cues?

Hamidreza Amirzadeh; Afra Alishahi; Hosein Mohebbi

arXiv:2410.03447·cs.CL·October 7, 2024

How Language Models Prioritize Contextual Grammatical Cues?

Hamidreza Amirzadeh, Afra Alishahi, Hosein Mohebbi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how BERT and GPT-2 prioritize multiple gender cues in context, revealing that BERT favors the first cue while GPT-2 relies on the last, highlighting differences in their contextual processing.

Contribution

The study provides a comparative analysis of encoder and decoder Transformer models' strategies for handling multiple contextual cues in gender agreement tasks.

Findings

01

BERT prioritizes the first cue in context.

02

GPT-2 relies more on the final cue.

03

Distinct strategies in cue utilization between encoder and decoder models.

Abstract

Transformer-based language models have shown an excellent ability to effectively capture and utilize contextual information. Although various analysis techniques have been used to quantify and trace the contribution of single contextual cues to a target task such as subject-verb agreement or coreference resolution, scenarios in which multiple relevant cues are available in the context remain underexplored. In this paper, we investigate how language models handle gender agreement when multiple gender cue words are present, each capable of independently disambiguating a target gender pronoun. We analyze two widely used Transformer-based models: BERT, an encoder-based, and GPT-2, a decoder-based model. Our analysis employs two complementary approaches: context mixing analysis, which tracks information flow within the model, and a variant of activation patching, which measures the impact of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hamid-amir/CueWords
pytorchOfficial

Videos

How Language Models Prioritize Contextual Grammatical Cues?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Syntax, Semantics, Linguistic Variation · Speech and dialogue systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Layer · Residual Connection · Weight Decay · Cosine Annealing · Linear Warmup With Linear Decay · Dropout · Byte Pair Encoding