Language model acceptability judgements are not always robust to context

Koustuv Sinha; Jon Gauthier; Aaron Mueller; Kanishka Misra; Keren; Fuentes; Roger Levy; Adina Williams

arXiv:2212.08979·cs.CL·December 20, 2022

Language model acceptability judgements are not always robust to context

Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren, Fuentes, Roger Levy, Adina Williams

PDF

Open Access

TL;DR

This paper investigates how the stability of language models' syntactic acceptability judgments varies with different contextual properties, revealing that models are sensitive to specific syntactic features in context, which impacts their judgment robustness.

Contribution

The study demonstrates that language models' syntactic judgments are highly sensitive to contextual syntactic structures, highlighting the role of implicit in-context learning in these judgments.

Findings

01

Models are robust in random contexts but unstable with matching syntactic structures.

02

Providing matching syntactic contexts improves judgments; unacceptable contexts worsen them.

03

Sensitivity to syntactic features is linked to implicit in-context learning abilities.

Abstract

Targeted syntactic evaluations of language models ask whether models show stable preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most targeted syntactic evaluation datasets ask models to make these judgements with just a single context-free sentence as input. This does not match language models' training regime, in which input sentences are always highly contextualized by the surrounding corpus. This mismatch raises an important question: how robust are models' syntactic judgements in different contexts? In this paper, we investigate the stability of language models' performance on targeted syntactic evaluations as we vary properties of the input context: the length of the context, the types of syntactic phenomena it contains, and whether or not there are violations of grammaticality. We find that model judgements are generally robust when placed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsTest