Context versus Prior Knowledge in Language Models
Kevin Du, V\'esteinn Sn{\ae}bjarnarson, Niklas Stoehr, Jennifer C., White, Aaron Schein, Ryan Cotterell

TL;DR
This paper investigates how language models balance prior knowledge and contextual information when answering questions, introducing metrics to quantify their dependency and susceptibility, and analyzing their relationship with model familiarity.
Contribution
It proposes two mutual information-based metrics to measure a model's reliance on context versus prior knowledge, validated through empirical testing.
Findings
Models depend more on prior knowledge for familiar entities.
Context influence varies depending on entity familiarity.
Metrics effectively quantify model dependency and susceptibility.
Abstract
To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others. To formalize this problem, we propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity: first, the persuasion score of a given context represents how much a model depends on the context in its decision, and second, the susceptibility score of a given entity represents how much the model can be swayed away from its original answer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
