Exploring BERT's Sensitivity to Lexical Cues using Tests from Semantic   Priming

Kanishka Misra; Allyson Ettinger; Julia Taylor Rayz

arXiv:2010.03010·cs.CL·April 23, 2021

Exploring BERT's Sensitivity to Lexical Cues using Tests from Semantic Priming

Kanishka Misra, Allyson Ettinger, Julia Taylor Rayz

PDF

1 Repo

TL;DR

This study investigates how BERT uses lexical cues in context, revealing that it exhibits priming effects similar to humans, with its predictions influenced by related words and context informativeness.

Contribution

It demonstrates that BERT shows lexical priming effects and analyzes how context influences its word prediction behavior, revealing parallels with human language processing.

Findings

01

BERT predicts related words more when context is less informative.

02

Priming effect decreases as context provides more information.

03

BERT's predictions are affected by lexical relatedness similarly to humans.

Abstract

Models trained to estimate word probabilities in context have become ubiquitous in natural language processing. How do these models use lexical cues in context to inform their word probabilities? To answer this question, we present a case study analyzing the pre-trained BERT model with tests informed by semantic priming. Using English lexical stimuli that show priming in humans, we find that BERT too shows "priming," predicting a word with greater probability when the context includes a related word versus an unrelated one. This effect decreases as the amount of information provided by the context increases. Follow-up analysis shows BERT to be increasingly distracted by related prime words as context becomes more informative, assigning lower probabilities to related words. Our findings highlight the importance of considering contextual constraint effects when studying word prediction in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanishkamisra/emnlp-bert-priming
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need